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Preface 


It was our privilege to serve as the program chairs for CAV 2022, the 34th 
International Conference on Computer-Aided Verification. CAV 2022 was held during 
August 7-10, 2022. CAV-affiliated workshops were held on July 31 to August 1 
and August 11 to August 12. This year, CAV was held as part of the Federated 
Logic Conference (FLoC) and was collocated with many other conferences in 
software/hardware verification and logic for computer science. Due to the easing of 
COVID-19 travel restrictions, CAV 2022 and the rest of the FLoC were in-person events. 

CAV is an annual conference dedicated to the advancement of the theory and practice 
of computer-aided formal analysis methods for hardware and software systems. The 
primary focus of CAV is to extend the frontiers of verification techniques by expanding 
to new domains such as security, quantum computing, and machine learning. This puts 
CAV at the cutting edge of formal methods research, and this year’s program is areflection 
of this commitment. 

CAV 2022 received a high number of submissions (209). We accepted nine tool 
papers, two case studies, and 40 regular papers, which amounts to an acceptance rate 
of roughly 24%. The accepted papers cover a wide spectrum of topics, from theoretical 
results to applications of formal methods. These papers apply or extend formal methods 
to a wide range of domains such as smart contracts, concurrency, machine learning, 
probabilistic techniques, and industrially deployed systems. The program featured a 
keynote talk by Ziyad Hanna (Cadence Design Systems and University of Oxford), a 
plenary talk by Aarti Gupta (Princeton University), and invited talks by Arie Gurfinkel 
(University of Waterloo) and Neha Rungta (Amazon Web Services). Furthermore, we 
continued the tradition of Logic Lounge, a series of discussions on computer science 
topics targeting a general audience. In addition to all talks at CAV, the attendees got 
access to talks at other conferences held as part of FLoC. 

In addition to the main conference, CAV 2022 hosted the following workshops: 
Formal Methods for ML-Enabled Autonomous Systems (FOMLAS), On the Not So 
Unusual Effectiveness of Logic, Formal Methods Education Online, Democratizing 
Software Verification (DSV), Verification of Probabilistic Programs (VeriProP), 
Program Equivalence and Relational Reasoning (PERR), Parallel and Distributed 
Automated Reasoning, Numerical Software Verification (NSV-XV), Formal Reasoning 
in Distributed Algorithms (FRIDA), Formal Methods for Blockchains (FMBC), 
Synthesis (Synt), and Workshop on Open Problems in Learning and Verification of 
Neural Networks (WOLVERINE). 

Organizing a flagship conference like CAV requires a great deal of effort from the 
community. The Program Committee (PC) for CAV 2022 consisted of 86 members — a 
committee of this size ensures that each member has a reasonable number of papers to 
review in the allotted time. In all, the committee members wrote over 800 reviews while 
investing significant effort to maintain and ensure the high quality of the conference 
program. We are grateful to the CAV 2022 PC for their outstanding efforts in evaluating 
the submissions and making sure that each paper got a fair chance. Like recent years in 
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CAV, we made the artifact evaluation mandatory for tool paper submissions and optional 
but encouraged for the rest of the accepted papers. The Artifact Evaluation Committee 
consisted of 79 reviewers who put in significant effort to evaluate each artifact. The goal 
of this process was to provide constructive feedback to tool developers and help make 
the research published in CAV more reproducible. The Artifact Evaluation Committee 
was generally quite impressed by the quality of the artifacts. Among the accepted regular 
papers, 77% of the authors submitted an artifact, and 58% of these artifacts passed the 
evaluation. We are very grateful to the Artifact Evaluation Committee for their hard work 
and dedication in evaluating the submitted artifacts. 

CAV 2022 would not have been possible without the tremendous help we received 
from several individuals, and we would like to thank everyone who helped make CAV 
2022 a success. First, we would like to thank Maria A Schett and Daniel Dietsch for 
chairing the Artifact Evaluation Committee and Hari Govind V K for putting together the 
proceedings. We also thank Grigory Fedyukovich for chairing the workshop organization 
and Shachar Itzhaky for managing publicity. We would like to thank the FLoC organizing 
committee for organizing the Logic Lounge, Mentoring workshop, and arranging student 
volunteers. We also thank Hana Chockler for handling sponsorship for all conferences 
in FLoC. We would also like to thank FLoC chair Alexandra Silva and co-chairs Orna 
Grumberg and Eran Yahav for the support provided. Last but not least, we would like 
to thank members of the CAV Steering Committee (Aarti Gupta, Daniel Kroening, 
Kenneth McMillan, and Orna Grumberg) for helping us with several important aspects 
of organizing CAV 2022. 

We hope that you will find the proceedings of CAV 2022 scientifically interesting 
and thought-provoking! 


June 2022 Sharon Shoham 
Yakir Vizel 
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Abstract. Amazon Web Services (AWS) is a cloud computing services 
provider that has made significant investments in applying formal meth- 
ods to proving correctness of its internal systems and providing assurance 
of correctness to their end-users. In this paper, we focus on how we built 
abstractions and eliminated specifications to scale a verification engine 
for AWS access policies, ZELKOVA, to be usable by all AWS users. We 
present milestones from our journey from a thousand SMT invocations 
daily to an unprecedented billion SMT calls in a span of five years. In 
this paper, we talk about how the cloud is enabling application of formal 
methods, key insights into what made this scale of a billion SMT queries 
daily possible, and present some open scientific challenges for the formal 
methods community. 


Keywords: Cloud Computing - Formal Verification - SMT Solving 


1 Introduction 


Amazon Web Services (AWS) has made significant investments in developing and 
applying formal tools and techniques to prove the correctness of critical internal 
systems and provide services to AWS users to prove correctness of their own sys- 
tems [24]. We use and apply a varied set of automated reasoning techniques at 
AWS. For example, we use (i) bounded model checking [35] to verify memory safety 
properties of boot code running in AWS data centers and of real-time operating 
system used in IoT devices [22, 25,26], (ii) proof assistants such as EasyCrypt [12] 
and domain-specific languages such as Cryptol [38] to verify cryptographic pro- 
tocols [3,4,23], (iii) HOL-Lite [33] to verify the BigNum implementation [2], (iv) 
P [28] to test key storage components in Amazon $3 [18], and (v) Dafny [37] to 
verify key authorization and crypto libraries |1]. Automated reasoning capabili- 
ties for external AWS users leverage (i) data-flow analysis [17] to prove correct 
usage of cloud APIs [29,40], (ii) monotonic SAT theories [14] to check properties 
of network configurations [5,13], and (iii) theories for strings and automaton in 
SMT solvers [16,39,46] to provide security for access controls [6,19]. 

This paper describes key milestones in our journey of generating billion SMT 
queries a day in the context of AWS Identity and Access Management (IAM). 
IAM is a system for controlling access to resources such as applications, data, 
and workload in AWS. Resource owners can configure access by writing policies 
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that describe when to allow and deny user requests that access the resource. 
These configurations are expressed in the IAM policy language. For example, 
Amazon Simple Storage Service (S3) is an object storage service that offers data 
durability, availability, security, and performance. S3 is used widely to store and 
protect data for a range of applications. A bucket is a fundamental container in $3 
where users can upload unlimited amounts of data in the form of objects. Amazon 
S3 supports fine-grained access control to the data based on the needs of the user. 
Ensuring that only intended users have access to their resource is important 
for the security of the resource. While the policy language allows for compact 
specifications of expressive policies, reasoning about the interaction between the 
semantics of different policy statements can be challenging to manually evaluate, 
especially in large policies with multiple operators and conditions. 

To help AWS users secure their resources, we built ZELKOVA, a policy anal- 
ysis tool designed to reason about the semantics of AWS access control policies. 
ZELKOVA translates policies and properties into Satisfiability Modulo Theories 
(SMT) formulas and uses SMT solvers to prove a variety of security properties 
such as “Does the policy grant broad public access?” [6]. The SMT encoding uses 
the theory of strings, regular expressions, bit vectors, and integer comparisons. 
The use of the wildcards x (any number of characters) and ? (exactly one char- 
acter) in the string constraints makes the decision problem PSPACE-complete. 
Zelkova uses a portfolio solver, where it invokes multiple solvers in the backend 
and uses the results from the solver that returns first, in a winner takes all strat- 
egy. This allows us to leverage the diversity among solvers and quickly solve 
queries—a couple hundred milliseconds to tens of seconds. A sample of AWS 
services that integrate ZELKOVA includes Amazon S3 (object storage), AWS 
Config (change-based resource auditor), Amazon Macie (security service), AWS 
Trusted Advisor (compliance to AWS best practices), and Amazon GuardDuty 
(intelligent threat detection). ZELKOVA drives preventative control features such 
as Amazon S3 Block Public Access and visibility into who outside an account 
has access to its resources [19]. 

ZELKOVA is an automated reasoning tool developed by formal methods 
experts and requires some degree of expertise in formal methods to use it. We 
cannot expect all AWS users to be experts in formal methods, have the time to 
be trained in the use of formal methods tools, or even be experts in the cloud 
domain. In this paper, we present the three pillars of our solution that enable 
ZELKOVA to be used by all AWS users. Using a combination of techniques such 
as eliminating specifications, domain-specific abstractions, and advances in SMT 
solvers we make the power of ZELKOVA available to all AWS users. 


2 Eliminate Writing Specifications 
End users will not write a specification 
ZELKOVA follows a traditional verification approach where it takes as input a 


policy and a specification, and produces a yes or no answer. We have devel- 
opers and cloud administrators who author policies to govern access to cloud 
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- Effect: Allow 
Condition: 
StringEquals: 
SrcVpc: 
= vpc-a 
=- vpc-b 
- Effect: Allow 
Condition: 
StringEquals: 
OrgiD?. 6=2 
= Effect: Deny 
Condition: 
StringEquals: 
SreVpc: Vvpc=b 
StringNotEquals: 
OrgID: o-1 


Fig. 1. An example AWS policy Fig. 2. Stratified abstraction search tree 


resources. We have someone else, a security engineer, who writes a specification 
of what is considered acceptable. The automated reasoning engine ZELKOVA 
does the verification and returns a yes or no answer. This approach is effective 
for a limited number of use cases, but it is hard to scale to all AWS users. The 
bottleneck to scaling the verification effort is the human effort required to specify 
what is acceptable behavior. The SLAM work had similar a observation about 
specifications; for use of Static Driver Verifier, they needed to provide the tool 
as well as the specification [7]. A person has to put in a lot of work upfront to 
define acceptable behavior and only at the end of the process, they get back an 
answer—a boolean. It’s a single bit of information for all the work they’ve put 
in. They have no information about whether they had the right specification or 
whether they wrote the specification correctly. 

To scale our approach to all AWS users, we had to fundamentally rethink 
our approach and completely remove the bottleneck of having people write a 
specification. To achieve that, we flipped the rules of the game and made the 
automated reasoning engine responsible for specification. We had the machine 
put in the upfront cost. Now it takes as input a policy and returns a detailed 
set of findings (declarative statements about what is true of the system). These 
findings are presented to a user, the security engineer, who reviews these findings 
and makes decisions about whether these findings represent valid risks in the 
system that should be fixed or are acceptable behaviors of the system. Users are 
now taking the output of the machine and saying “yes” or “no”. 


2.1 Generating Possible Specifications (Findings) 


To remove the bottleneck of specification, we changed the question from is this 
policy correct? to who has access?. The response to the former is a boolean while 
the response to the latter is a set of findings. AWS access control policies specify 
who has access to a given resource, via a set of Allow and Deny statements that 
grant and prohibit access, respectively. Figure 1 shows a simplified policy specify- 
ing access to an AWS resource. This policy specifies conditions on the cloud-based 
network (known as a VPC) for which the request originated and on the organi- 
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zational Amazon customer (referred to by an Org ID) who made the request. The 
first statement allows access to any request whose SrcVpc is either vpc-a or vpc-b. 
The second statement allows access to any request whose OrgId is 0-2. However, 
the third statement denies access from vpc-b unless the OrgId is o-1. 

For each request, access is granted only if: (a) some Allow statement matches 
the request, and (b) none of the Deny statements match the request. Conse- 
quently, it can be quite tricky to determine what accesses are allowed by a given 
policy. First, individual statements can use regular expressions, negation, and 
conditionals. Second, to know the effect of an allow statement, one must con- 
sider all possible deny statements that can overlap with it, i.e., can refer to 
the same request as the allow. Thus, policy verification is not compositional, in 
that we cannot determine if a policy is “correct” simply by locally checking that 
each statement is “correct.” Instead, we require a global verification mechanism, 
that simultaneously considers all the statements and their subtle interactions, 
to determine if a policy grants only the intended access. 

For the example policy sketch shown in Fig.1, access can be summarized 
through a set of three findings, which say that access is granted to a request iff: 


— Its SrcVpc is vpc-a, or, 
— Its OrgId is 0-2, or, 
— Its SrcVpc is vpc-b and its OrgId is o-1. 


The findings are sound as no other requests are granted access. The findings are 
mostly precise; most of the requests match the conditions that are granted access. 
The finding “OrgId is 0-2” also includes some requests that are not allowed, e.g., 
when SrcVpc is vpc-b. To help understandability of the findings, we sacrifice this 
precision. Precise findings would need to include negation, and that would add 
complexity for the users to make decisions. Finally, the findings compactly summa- 
rize the policy in three positive statements declaring who has access. In principle, 
the notion of compact findings is similar to abstract counterexamples or minimiz- 
ing counterexamples [21,30,32]. Since the findings are produced by the machine 
and already verified to be true, we have a person deciding if they should be true. 
The human is making a judgment call and expressing intent. 

We use stratified predicate abstraction for computing the findings. Enumer- 
ating all possible requests is computationally intractable, and even if it were 
not, the resulting set of findings is far too large and hence useless. We tackle the 
problem of summarizing the super-astronomical request-space by using predicate 
abstraction. Specifically, we make a syntactic pass over the policy to extract the 
set of constants that are used to constrain access, and we use those constants 
to generate a family of predicates whose conjunctions compactly describe parti- 
tions of the space of all requests. For example, from the policy in Fig. 1 we would 
extract the following predicates 


Da = SrcVpc = vpc-a, py = SrcVpc = vpc-b, p, = SrcVpc = x, 

qi = Orgld = o-1, q2 = Orgld = 0-2, dx = Orgld = x. 
The first row has three predicates describing the possible value of the SrcVpc of the 
request: that it equals vpc-a or vpc-b or some value other than vpc-a and vpc-b. 
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Fig. 3. Cubes generated by the predicates pa, po, Px, q1, q2, qx generated from the policy 
in Fig. 1 and the result of querying ZELKOVA to check if the the requests corresponding 
to each cube are granted access by the policy. 


Similarly, the second row has three predicates describing the value of the OrgId of 
the request: that it equals o-1 or o-2 or some value other than o-1 and o-2. 

We can compute findings by enumerating all the cubes generated by the 
above predicates and querying ZELKOVA to determine if the policy allows access 
to the requests described by the cube. The enumeration of cubes is common in 
SAT solvers and other predicate abstraction based approaches [8,15,36]. The 
set of all the cubes are shown in Fig. 3. The chief difficulty with enumerating 
all the cubes greedily is that we end up eagerly splitting-cases on the values of 
fields when that may not be required. For example, in Fig. 3, we split cases on 
the possible value of OrgId even though it is irrelevant when SrcVpc is vpc-a. 
This observation points the way to a new algorithm where we lazily generate the 
cubes as follows. Our algorithm maintains a worklist of minimally refined cubes. 
At each step, we (1) ask ZELKOVA if the cube allows an access that is not covered 
by any of its refinements; (2) if so, we add it to the set of findings; and (3) if 
not, we refine the cube “point-wise” along the values of each field individually 
and add the results to the worklist. The above process is illustrated in Fig. 2. 

The specifications or findings generated by the machine are presented in the 
context of the access control domain. The developers do not have to learn a 
new means to specify correctness, think about what they want to be correct 
of the system, or check the completeness of their specifications. This is a very 
important lesson that we need to apply across many other applications for formal 
methods to be successful at scale. The challenge here is the specifics depend on 
the domain. 


3 Domain-Specific Abstractions 
It’s all about the end user 


ZELKOVA was developed by formal methods subject matter experts who learnt 
domain of AWS access control policies. Once we had the analysis engine, we faced 
the same challenges all other formal methods tool developers had before us. How 
do we make it accessible to all users? One hard earned lesson was “eliminating 
the need for specifications” as discussed in the previous section. But that was 
only part of the answer. There was a lot more to do. Many more questions to 
answer—How do we get users to use it? How do we present the results to the 
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Active Archived Resolved All 


Active findings Actions ¥ 
Account ID 180286015604 


Q Filter active findings <a: 
Resource: "gacek-bucket-c" X Clear filters 


Finding ID Resource External principal Condition Shared through Access level Updated v 


d13b5e07-. All Principals vpo-a Bucket policy Read a few seconds ago 
gacek-bucket-c 


b64a0562-. X All Principals 0-2 Bucket policy Write, Permissions, Tagging a few seconds ago 
gacek-bucket-c 


Ki o-1 
743170b7-... All Principals 


Bucket policy Read a few seconds ago 
gacek-bucket-c vpe-b 


Fig. 4. Interface that presents Access Analyzer findings to users. 


users? How do the results stay updated? The answer was to design and build 
domain-specific abstractions. Do one thing and do it really well. 

We created a higher level service on top of ZELKOVA called IAM Access 
Analyzer. We provide a one-click way to enable Access Analyzer for an AWS 
account or AWS Organization. An account in AWS is a fundamental construct 
that serves as a container for the user’s resources, workloads, and data. Users 
can create policies to grant access to resources in their account to other users. 
In Access Analyzer, we use the account as a zone of trust. This abstraction lets 
us say that access to resources by users within their zone of trust is considered 
safe. But access to resources outside their zone of trust is potentially unsafe. 

Once a user enables Access Analyzer, we use stratified predicate abstraction 
to analyze the policies and generate findings showing which users outside the zone 
of trust have access to resources. We had to shift from a mode where ZELKOVA 
can answer “any access query” to ZELKOVA can enumerate “who has access to 
what”. This brings to attention the permissions that could lead to unintended 
access of data. While this idea seems simple in hindsight, it took us a couple of 
years to figure out the right abstraction for the domain. It can be used by all 
AWS users. They did not need to be experts in the area of formal methods or 
even have deep understanding of how access control in the cloud worked. 

Each finding includes details about the resource, the external entity with 
access to it, and the permissions granted so that the user can take appropri- 
ate action. We present example findings in Fig.4. Note these findings are not 
presented as SMT-lib formulas but rather in the domain that the user expects— 
AWS access control constructs. These map to the findings presented in the pre- 
vious section for Fig. 1. Users can view the details included in the finding to 
determine whether the access is intentional or a potential risk that the user 
should resolve. 

Most automated reasoning tools are run as a one-off: prove something, and 
then move on to the next challenge. In the cloud environment this was not 
the case. Doing the analysis once was not sufficient in our domain. We had 
to design a means to continuously monitor the environment and changes to 
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access control policies within the zone of trust and update the findings based 
on that. To that end, Access Analyzer analyzes these policies if a user adds a 
new policy, or changes an existing policy, and either generates new findings, or 
removes findings, or updates the existing findings. Access Analyzer also analyzes 
all policies periodically, to ensure that in a rare case, if a change event to the 
policy is missed by the system, it is still able to keep the findings updated. The 
ease of enablement, just-in-time analysis on updates, and periodic analysis across 
all policies are the key factors in getting us to a billion queries daily. 


4 SMT Solving at Cloud Scale 


Every query matters 


The use of SMT solving in AWS features and services means that millions of 
users are relying on the correctness and timeliness of the underlying solvers for 
the security of their cloud infrastructure. The challenges around correctness and 
timeliness in solver queries have been well studied in the automated reasoning 
community, but they have been treated as independent features. Today, we are 
generating a billion SMT queries every day to support various use cases across 
a wide variety of AWS services. We have discovered an intricate dependency 
between correctness and timeliness that manifests at this scale. 


4.1 Monotonicity in Runtimes Across Solver Versions 


Zelkova uses a portfolio solver to discharge its queries. When given a query, 
Zelkova invokes multiple solvers in the backend and uses the results from the 
solver that returns first, in a winner takes all strategy [6]. The portfolio app- 
roach allows us to leverage the diversity amongst solvers. One of our goals is 
to leverage the latest advancements in the SMT solver community. SMT solver 
researchers and developers are fixing issues, making improvements to existing 
features, adding new theories, adding features such as generation of proofs, and 
making other performance improvements. Before deploying a new version of the 
solver within the production environment, we perform extensive offline testing 
and benchmarking to gain confidence in the correctness of the answers, perfor- 
mance of the queries, and ensure there are no regressions. 

While striving for correctness and timeliness, one of the challenges we face 
is that new solver versions are not monotonically better in their performance 
than their previous version. A solution that works well in the cloud setting is a 
massive portfolio, sometimes even containing older versions of the same solver. 
This presents two issues. One, when we discover a bug in an older version of 
the solver, we need to patch this old version. This creates an operational bur- 
den of maintaining many different versions of the different solvers. Two, when 
the number of solvers increases, we need to ensure that each solver provides a 
correct result. Checking the correctness of queries that result in SAT is straight- 
forward, but SMT solvers need to provide proof for the UNSAT queries. The 
proof generation and checking needs to be timely as well. 
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Fig. 5. Comparing the runtime for solving SMT queries generated by ZELKOVA by 
CVC4 and the different cvc5 versions (a) CVC4 vs. cvc5 version 0.0.4, (b) CVC4 vs. 
cvc5 version 0.0.7. Comparing the runtimes of winner take all in the portfolio solver 
of ZELKOVA with: (c) a portfolio solver consisting of Z3 sequence string solver, Z3 
automata solver, and cvc5 version 0.0.4 (d) a portfolio solver consisting of Z3 sequence 
string solver, Z3 automata solver, and cvc5 version 0.0.7. Evaluating the performance 
of the latest cvc5 version 1.0.0 with its older versions (e) cvc5 version 0.0.4 and (f) cvc5 
version 0.0.7 
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In the Zelkova portfolio solver [6], we use CVC4, and our original goal was to 
replace CVC4 with the then latest version of cvc5 (version 0.0.4)'. We wanted 
to leverage the proof checking capabilities of cvc5 to ensure the correctness of 
UNSAT queries [11]. To check the timeliness requirements, we ran experiments 
across our benchmarks, comparing the results of CVC4 to those of cvc5 (version 
0.0.4). The results across a representative set of queries are shown in Fig. 5(a). 
In the graph we have approximately 15,000 SMT queries that are generated by 
Zelkova; we select a distribution of queries that are solved between 1s and 30s, 
after which the solver process is killed and a timeout is reported. Some queries 
that are not solved by CVC4 within the time bound of 30s are now being solved 
by cvc5 (version 0.0.4), as seen by the points in the graph along the y-axis on 
the extreme right. However, cvc5 (version 0.0.4) times out on some queries that 
are solved by CVC4, as seen by the points on the top of the graph. 

The results presented in Fig. 5(b) are not surprising given that the problem 
space is computationally hard, and there is an inherent randomness in search 
heuristics within SMT solvers. In an evaluation of cvc5, the authors discuss 
examples where CVC4 outperforms cvc5 [10]. But this poses a challenge for us 
when we are using the result of these solvers in security controls and services that 
millions of users rely on. The changes did not meet the timeliness requirement 
of continuing to solve the queries within 30s. When a query times out, to be 
sound, the analysis marks the bucket as public. The impact of a query timing 
out, that was previously being solved, will lead to the user not being able to 
access the resource. This is unexpected for the user because there was no change 
in their configuration. 

For example, consider the security checks in the Amazon $3 Block Public 
Access that block requests based on the results of the analysis. In this context, 
suppose that there was a bucket marked as “not public” based on the results 
of a query, and now that same query times out; the bucket will be marked as 
“public”. This will lock down access to the bucket and the intended users will 
not be able to access it. Even a single regression that leads to loss of access for 
the user is not an acceptable change. As another example, these security checks 
are also used by IoT devices. In the case of a smart lock, a time out in the query 
that was previously being solved could lead to a loss of access to the user’s home. 
The criticality of these use cases combined with the end user expectation is a 
key challenge in our domain. 

We debugged and fixed the issue in cvc5 that was causing certain queries 
to time out. But even with this fix, CVC4 was 2x faster than cvc5 for many 
easier problems that took 1s to solve originally. This slowdown was significant 
for us because ZELKOVA is called in the request path of security controls such as 
Amazon S3 Block Public Access. When a user attempts to attach a new access 
control policy or update an existing one, a synchronous call is made to Zelkova 
and the corresponding portfolio solvers to determine if the access control policy 


1 Note that while this section talks in detail about the CVC solver, the observations are 
common across all solvers. We select the results of the CVC solver as a representative 
because it is a mature solver with an active community. 
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being attached grants unrestricted public access or not. The bulk of the analysis 
time is spent in the SMT solvers, so doubling the analysis time for queries can 
lead to a degraded user experience. Where and how the analysis results are used 
plays an important role in how we track changes to the timeliness of the solver 
queries. 

Our solution was to add a new solver to the portfolio rather then replace an 
existing solver. We added cvc5 (version 0.0.7) to the existing portfolio of solvers 
consisting of CVC4, Z3 with the sequence string solver, and a custom Z3-based 
automata solver. When we started the evaluation of cvc5, we did not plan to add 
a new version of the CVC solver to the portfolio. We had expected to the latest 
version of cvc5 to be comparable in timeliness to CVC4. We worked closely with 
the CVC developers and cvc5 was better on many queries, but it did not meet 
our timeliness requirements on all queries. This led to our decision to add cvc5 
(version 0.0.7) to the Zelkova portfolio solver. 

The results of comparing the portfolio solvers of two Z3 solvers, CVC4 and 
cvc5 (version 0.0.4) with a winner take all and portfolio solver without cvc5 (ver- 
sion 0.0.4) is shown in Fig. 5(c). The same configuration now with cvc5 (version 
0.0.7) is shown in Fig. 5(d). The results show that the portfolio solving approach 
that Zelkova takes in the cloud is an effective one. 

The cycle now repeats with cvc5 (version 1.0.0), and the same question comes 
up again. The question we are evaluating yet again is, “do we upgrade the 
existing cvc5 version with the latest or add yet another version of CVC to the 
portfolio solver”. Some early experiments show that there is no clear answer 
yet. The results so far comparing the different version of cvc5 shown in Fig. 5(e) 
and (f) indicate that the latest version of cvc5 is not monotically better in 
performance than either of its previous versions. We do want to leverage the 
better proof generating capabilities of cvc5 (version 1.0.0) in order to gain more 
assurance in the correctness of the UNSAT queries. 


4.2 Stability of the Solvers 


We have spent quite a bit of time defining and implementing the encoding of the 
AWS access control policies into SMT. We update the encoding as we expand 
to more use cases or when we support new features in AWS. This is a slow and 
careful process that requires expertise in understanding AWS and how SMT 
solvers work. There is a lot of trial and error to figure out what encoding is 
correct and performant. 

To illustrate the importance of the encoding, we present an experiment on 
solver runtimes with different ordering of clauses for our encoding (Fig. 6). For 
the same set of problem instances used in Fig. 5, we now use the standard SMT 
competition shuffler? to reorder assertions, terms, and rename variables to study 
the effect of ordering clauses for our default encoding. In Fig.6, each point on 
the x axis corresponds to a single problem instance. For the problem instance, 
we run it in its original form (default encoding) which is the “base time”, and 


? https: //github.com/SMT-COMP/scrambler. 
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Fig. 6. Variance in runtimes after shuffling terms in the problem instances. 
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five shuffled versions. This gives us a total of six versions of the problem; we 
record the min, max, and mean times. So for each problem instance, x we have: 


1. (x, base time): time on the original problem; 

2. (x, min time): minimal time on the original and 5 shuffled problems; 

3. (x, max time): maximal time on the original and 5 shuffled problems; and 
4. (x, mean time): mean time on the original and 5 shuffled problems. 


The instances are sorted by ‘base time’ so the line looks smooth in base time, and 
the other points look more scattered. The comparison between CVC4 in Fig. 6(a) 
and Fig.6(b) cvc5 shows that cvc5 can solve more problems with the default 
encoding shown by the smooth base line. However, when we shuffle the asser- 
tions, terms and other constructs in the problem instance, the performance of 
cvcd varies more dramatically compared to that of CVC4. The points for the 
maximal time are spread wider across the graph and there are now several time- 
outs in Fig. 6(b). 


4.3 Concluding Remarks 


Based on our experience from generating a billion SMT queries a day, we pro- 
pose some general areas of research for the community. We believe these are 
key to enabling the use of solvers to evaluate security controls, and to enable 
applications in emerging technologies such as quantum computing, blockchains, 
and bio-engineering. 


Monotonicity and Stability in Runtimes. One of the main challenges we 
encountered is the lack of monotonicity and stability in runtimes within a given 
solver version and across different versions. Providing this stability is a funda- 
mentally hard problem due to the inherent randomness in SMT solver heuristics, 
search strategies, and configuration flags. One approach would be to incorporate 
the algorithm portfolio approach [31,34,42] within mainstream SMT solvers. A 
way enable algorithm portfolio is to leverage serverless and cloud computing 
environment, and develop parallel SMT solving and distributed search strate- 
gies. At AWS, this is an area that we are investing in as well. There has been 
some work in parallel and distributed SMT solving [41,45] but we need more. 
Another aspect of research would be to develop specialized solvers that focus on 
a specific class of problems. The SMT-comp could devise categories that allow 
room for specific types of problem instances as an incentive for developing these 
solvers. 


Reduce the Barrier to Entry. Generating a billion SMT queries day is a 
result of the exceptional work and innovation of the entire SMT community 
over the past 20 years. A question we are thinking about is how to replicate 
the success described here for other domains in Amazon and elsewhere. There 
is a natural tendency in the formal methods community to target tools for the 
expert user. This limits their broader use and applicability. If we can find ways 
to lower the barrier to adoption, we can gain greater traction and improve the 
security, correctness, availability, and robustness of more systems. 
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More Abstractions. SMT solvers are powerful engines. One potential research 
direction for the broader community is to provide one or more higher level lan- 
guages that allows people to specify their problems. We could create different 
languages based on the domain and take into account the expectations of devel- 
opers. This would make interacting with a solver a more black-box exercise. The 
success we have had with SMT in Amazon, can be recreated in other domains 
if we provide developers the ability to easily encode their problems in a higher 
level language and use SMT solvers to solve them. It will more easily scale by not 
requiring a formal methods expert as an intermediary. Developing new abstrac- 
tions or intermediate representations could be one approach to unlock billions 
of other SMT queries. 


Proof Generation. All SMT solvers should be generating proofs to help the 
end-user gain confidence in the results. There has been some initial work in this 
area [9,20,27,43,44],but SMT has a long way to catch up with SAT solvers, 
and for good reason. The proof production is important for us gain greater 
confidence in the correctness of our answers, though it creates a tension with the 
timeliness. We need the proof production to be performant and the tools that 
check the generated proofs to be correct themselves. Continued push on different 
testing approaches, including fuzzing and property-based testing of SMT solvers, 
should continue with the same rigor and enthusiasm. Using these fuzz testing 
and mutation testing based techniques in the development workflow of SMT 
solvers is something that should become mainstream. 

We are working to provide a set of benchmarks that can be leveraged by 
SMT developers to help further their work, are funding research grants in these 
areas, and are willing to evaluate new solvers. 
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Abstract. Many problems in program verification, Model Checking, 
and type inference are naturally expressed as satisfiability of a verifica- 
tion condition expressed in a fragment of First-Order Logic called Con- 
strained Horn Clauses (CHC). This transforms program analysis and 
verification tasks to the realm of first order satisfiability and into the 
realm of SMT solvers. In this paper, we give a brief overview of how 
CHCs capture verification problems for sequential imperative programs, 
and discuss CHC solving algorithm underlying the SPACER engine of 
SMT-solver Z3. 


1 Introduction 


First Order Logic (FOL) is a powerful formalism that naturally captures many 
interesting decision (and optimization) problems. In recent years, there has been 
a tremendous progress in automated logic reasoning tools, such as Boolean SAT- 
isfiability Solvers (SAT) and Satisfiability Modulo Theory (SMT) solvers. This 
enabled the use of logic and logic satisfiabilty solvers as a universal solution to 
many problems in Computer Science, in general, and in Program Analysis, in 
particular. Most new program analysis techniques formalize the desired analysis 
task in a fragment of FOL, and delegate the analysis to a SAT or an SMT solver. 
Examples include deductive verification tools such as Dafny [30] and Why3 [13], 
symbolic execution engines such as KLEE [7], Bounded Model Checking engines 
such as CBMC [10] and SMACK [9], and many others. 

In this paper, we focus on a fragment of FOL called Constrained Horn 
Clauses (CHC). CHCs arise in many applications of automated verification. 
They naturally capture such problems as discovery and verification of induc- 
tive invariants [4,18]; Model Checking of safety properties of finite- and 
infinite-state systems [2,23]; safety verification of push-down systems (and their 
extensions) [4,28]; modular verification of distributed and parameterized sys- 
tems [17,19,33]; and type inference [35,36], and many others. 

Using CHC, developers of program analysis tools can separate the process of 
developing a proof methodology (also known as generation of Verification Con- 
dition (VC)) from the algorithmic details of deciding whether the VC is correct. 
Such a flexible design simplifies supporting multiple proof methodologies, mul- 
tiple languages, and multiple verification tasks with a single framework. Today, 
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there are multiple effective program verification tools based on the CHC method- 
ology, including a C/C++ verification framework SEAHORN [18], a Java verifica- 
tion framework JAYHORN [25], and an Android information flow verification tool 
HORNDROID [8], a Rust verification framework RuSTHORN [31], Solidity veri- 
fication tools SmartACE [37] and Solidity Compiler Model Checker [1]. Many 
more approaches utilize CHC as part of a more general verification solution. 

The idea of reducing program verification (and model checking) to FOL sat- 
isfiability is well researched. A great example is the use of Constraint Logic Pro- 
gramming (CLP) [24] in program verification, or the use of Datalog for pointer 
analysis [34]. What is unique is the application of SMT-solvers in the decision 
procedure and lifting of techniques that have been developed in Model Check- 
ing and Program Verification communities to the uniform setting of satisfiabilty 
of CHC formulas. In the rest of this paper, we show how verification prob- 
lems can be represented in CHCs (Sect. 2), and describe key algorithms behind 
SPACER [27], a CHC engine of the SMT solver Z3 [32] that is used to solve them 
(Sect. 3). 


2 Logic of Constrained Horn Clauses 


In this section, we give a brief overview of Constrained Horn Clauses (CHC). We 
illustrate an application of CHC to verification of a simple imperative program 
with a loop. 

The logic of Constrained Horn Clauses is a fragment of FOL. We assume 
that the reader is familiar with the basic concepts of FOL, including signatures, 
theories, and models. For the purpose of this presentation, let X be some fixed 
FOL signature and A be an FOL theory over X. For example, X is a signature 
for arithmetic, including constants 0, and 1, and a binary function -+-, and A 
the theory of Presburger arithmetic. A Constrained Horn Clause (CHC) is an 
FOL sentence of the form: 


YV - (pA pi(X1) A+++ A Pkl Xk) = h(X)) (1) 


where V is the set of all free variables in the body of the sentence, {p;}*_, 
and h are uninterpreted predicate symbols (in the signature), {X;}*_, and X 
are first-order terms, and p(X) stands for application of predicate p to a list of 
terms X. 

A CHC in Eq. (1) can be equivalently written as the following clause: 


(> V api (X1) V V apn(Xn) V A(X) (2) 


where all free variables are implicitly universally quantified. Note that in this 
case only h appears positively, which explains why these are called Horn clauses. 
We write CHC(A) to denote the set of all sentences in FOL modulo theory A 
that can be written as a set of Constrained Horn Clauses. A sentence ® is in 
CHC(A) if it can be written as a conjunction of clauses of the form of Eq. (1). 
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assume(x <= 0); 


while (x < 5) { Va-a <0 = > Inv(z) 
= + 1: 

} ane , Va,y-Inv(a) An <5Ay=xz+1 => Inv(y) 

assert(x < 10); Va-Inv(x) ^ ~(x < 5) A7(a2 < 10) => false 


Fig. 1. A program and its verification conditions in CHC. 


A CHC(A) sentence @ is satisfiable if there exists a model M of A extended 
with interpretation for all of the uninterpreted predicates in & such that M sat- 
isfies 6, written M = ®. In practice, we are often interested not in an arbitrary 
model, but a model that can be described concisely in some target fragment of 
FOL. We call such models solutions. Given an FOL fragment F, an F-solution 
to a CHC(A) formula ® is a model M such that M — © and interpretation of 
every uninterpreted predicate in M is definable in F. Most commonly, F is taken 
to be either a quantifier free or universally quantified fragment of arithmetic A, 
often further extended with arrays. 


Example 1. To illustrate the definitions above consider a C program of a simple 
counter shown in Fig. 1. The goal is to verify that the assertion at the end of the 
program holds on every execution. To verify the assertion using the principle 
of inductive invariants, we need to show that there exists a formula Inv(z) 
over program variable x such that (a) it is true before the loop, stable at every 
iteration of the loop, and guarantees the assertion when the loop terminates. 
Since we are interested in partial correctness, we are not concerned with the case 
when the loop does not terminate. This principle is naturally encoded as three 
Constrained Horn Clauses, shown in the in Fig. 1. The uninterpreted predicate 
Inv represents the inductive invariant. The program is correct, hence the CHCs 
are satisfiable. The satisfying model extends the theory of arithmetic with the 
following definitions of Inv: 


Inv = {z|2z<5} (3) 


The CHCs also have a solution in the quantifier free theory of Linear Integer 
Arithmetic. In particular, Inv can be defined as follows: 


Inv =z2-2<5 (4) 


where the notation function with argument x and body y. 

The CHCs in this example can be expressed as an SMT-LIB script, shown 
in Fig. 2, and solved by SPACER engine of Z3. Note that the script uses some 
Z3-specific extensions, including logic HORN and several option that disable pre- 
processing (which is not necessary for such a simple example). 


Example 2. Figure 3 shows a similar program, however, with a function inc that 
abstracts away the increment operation. The corresponding CHCs are also shown 
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(set-logic HORN) 

(set-option :fp.xform.inline_linear false) 
(set-option :fp.xform.inline_eager false) 
(declare-fun Inv ( Int ) Bool) 


(assert (forall ((x Int)) (=> (<= x @) (Inv x)))) 

(assert (forall ((x Int)) (=> (< x 5) (Inv (+ x 1))))) 

(assert (forall ((x Int)) (=> (and (Inv x) (>= x 5) (>= x 10)) false))) 
(check-sat) 

(get-model) 


Fig. 2. CHCs from Fig. 1 in SMT-LIB format. 


int inc(int z) { return z +1; } 


assume(x <= @); Vz,r-r=z+1 = Inc(z,r) 

while (x < 5) { Va-2 <0 = Inv(z) 
x = inc(x); 

} Va,y-Inv(x) Aa < 5A Inc(y,x) => Inv(y) 

assert(x < 10); Va-Inv(x) ^ =(x <5) A7(a < 10) => false 


Fig. 3. A program with a function and its verification conditions in CHC. 


in Fig.3. There are two unknowns, Inv that represents the desired inductive 
invariant, and Inc that represents the summary (i.e., pre- and post-conditions, 
or an over-approximation) of the function inc. Since the program still satisfies 
the assertion, the CHCs are satisfiable, and have 
Inv“ = {z|z< 5} =Az-2<5 (5) 
Inc = {(z,r)|r=z+1}=dz,r-r<z4+1 (6) 


The corresponding SMT-LIB script is shown in Fig. 4. 


Example 3. In this last example, consider a set of CHCs shown in Fig. 5. They 
are similar to CHCs in Fig. 1, with one exception. These CHCs are unsatisfiable. 
There is no interpretation of Inv to satisfy them. This is witnessed by a refutation 
— a resolution proof — shown in Fig.6. The corresponding SMT-LIB script in 
shown in Fig. 7. 


3 Solving CHC Modulo Theories 


The logic of CHC can be seen as a convenient modelling language. That is, it does 
not restrict or impose a preference on a decision procedure used to solve the prob- 
lem. In fact, a variety of solvers and techniques are widely available, including 
SPACER [28] (that is available as part of Z3), FreqHorn [12], and ELDARICA [22]. 
There is also an annual competition, CHC-COMP’, to evaluate state-of-the-art 
solvers. In the rest of this section, we give a brief overview of the algorithm 
underlying SPACER. 


1 https://chc-comp.github.io/. 
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(set-logic HORN) 

(set-option :fp.xform.inline_linear false) 
(set-option :fp.xform.inline_eager false) 
(declare-fun Inv ( Int ) Bool) 
(declare-fun Inc ( Int Int ) Bool) 


(assert (forall ((z Int)) (Inc z (+ z 1)))) 

(assert (forall ((x Int)) (=> (<= x @) (Inv x)))) 

(assert (forall ((x Int) (y Int)) (=> (and (< x 5) (Inc x y)) (Inv y)))) 
(assert (forall ((x Int)) (=> (and (Inv x) (>= x 5) (= x 10)) false))) 
(check-sat) 

(get-model) 


Fig. 4. CHCs from Fig. 3 in SMT-LIB format. 


Vo-x <0 = > Inv(a) 
Va,y-Inv(a) Nan <5bAy=a4+1 = Inv(y) 
Va-Inv(x) \7(2@ > 1) => false 


Fig. 5. An example of unsatisfiable CHCs. 


SPACER is an extension and generalization of SAT-based Model Checking 
algorithms to CHC modulo SMT-supported theories. On propositional transition 
systems, SPACER behaves similarly to IC3 [6] and PDR [11], and can be seen as 
an adaptation of these algorithms. For other first-order theories, SPACER extends 
Generalized PDR of Hoder and Bjørner [21]. 

Given a CHC system ®, SPACER works by iteratively looking for a bounded 
derivation of false from ®. It explores ® in a top-down (or backwards) direction. 
Each time SPACER fails to find a derivation of a fixed bound JN, the reasons for 
failure are analyzed to derive consequences of ® that explain why a derivation 
of false must have at least N + 1 steps. This process is repeated until either (a) 
false is derived and ® is shown to be unsatisfiable, (b) the consequences form a 
solution to &, thus, showing that & satisfiable, or (c) the process continues indefi- 
nitely, but continuously ruling out impossibility of longer and longer refutations. 
Thus, even though the problem is in general undecidable, SPACER always makes 
progress trying to show that ® is unsatisfiable or that there is no short proof of 
unsatisiability. 

SPACER is a procedure for solving linear and non-linear CHCs. For conve- 
nience of the presentation, we restrict ourselves to a special case of non-linear 
CHCs that consists of the following three clauses: 


N 
S 


Init(X) > P(X) ( 
P(X) = Bad(X) ( 
P(X) A P(X°) A Tr(X, X°, X’) > P(X") ( 


Oo © 
Se ma 
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Va-x >0 = > Inv(z) 
ao o Inv(0) Va-Inv(x) An <5 => Inv(x +1) 
Inv(1) Vz- Inv(z) ^z > 1 => false 
=e false 


Fig. 6. Refutation proof for CHCs in Fig. 5. 


(set-logic HORN) 

(set-option :produce-proofs true) 
(set-option :fp.xform.inline_linear false) 
(set-option :fp.xform.inline_eager false) 
(declare-fun Inv ( Int ) Bool) 


(assert (forall ((x Int)) (=> (<= x @) (Inv x)))) 

(assert (forall ((x Int)) (=> (< x 5) (Inv (+ x 1))))) 

(assert (forall ((x Int)) (= (and (Inv x) (= x 5) (>= x 2)) false))) 
(check-sat) 

(get-proof) 


Fig. 7. CHCs from Fig. 5 in SMT-LIB format. 


where, X is a set of free variables, X’ = {x’ | x € X} and X° = {x° | x € X} 
are auxiliary free variables, Init, Bad, and Tr are FOL formulas over the free 
variables (as indicated), and P is an uninterpreted predicate. Recall that all 
free variables in each clause are implicitly universally quantified. Thus, the only 
unknown to solve for is the uninterpreted predicate P. We call these three clauses 
a safety problem, and write (Init(X), Tr(X,X°, X’), Bad(X)) as a shorthand to 
represent them. It is not hard to show that satisfiability of arbitrary CHCs is 
reducible to a safety problem. Thus, this simplification does not lose generality. In 
practice, SPACER directly supports more complex CHCs with multiple unknown 
uninterpreted predicates. 

Before presenting the algorithm, we need to introduce two concepts from 
logic: Craig Interpolation and Model Based Projection. 


Craig Interpolation. Given two formulas A[#,Z] and Bly, 7] such that A A B 
is unsatisfiable, a Craig interpolant I[Z] = Irp(A[Z, 7], Bly, 2), is a formula 
I{z| such that A[Z, z| > I[z] and I[Z] > “Bly, z7]. We further require that the 
interpolant is a clause. Intuitively, the interpolant J captures the consequences 
of A that are inconsistent with B. If A is a conjunction of literals, the interpolant 
can be seen as a semantic variant of an UNSAT core. 


Model Based Projection. Let p be a formula, U C Vars(y) a subset of variables 
of y, and P a model of y. Then, Y = MBP(U, P, p) is a model based projection 
if (a) & is a monomial, (b) Vars(w) C Vars(y) \ U, (c) PE y, (d) 4y > 
AV - ọ. Intuitively, an MBP is an under-approximation of existential quantifier 
elimination, where the choice of the under-approximation is guided by the model. 
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Input: A safety problem (Init(X), Tr(X,X°, X’), Bad(X)). 

Output: Unreachable or Reachable 

Data: A cex queue Q, where a cex c € Q is a pair (m,i), m is a cube over 
state variables, and i € N. A level N. A set of reachable states REACH. 
A trace Fo, Fi,... 

Notation: F(A, B) = Init( X’) V (A(X) A B(X°) A Tr), and F(A) = F(A, A) 

Initially: Q =@, N = 0, Fo = Init, Vi > 0 - F; = Ø, REACH = Init 

Require: Init — ~ Bad 

repeat 

Unreachable If there is ani < N s.t. F; C Fi+ı return Unreachable. 


Reachable If REACH A Bad is satisfiable, return Reachable. 
Unfold If Fy — —~Bad, then set N — N +1 and Q = Ô. 
Candidate If for some m, m — Fy A Bad, then add (m, N} to Q. 


Successor If there is (m,i +1) € Q and a model M s.t. M H w, where 
p = F(VREACH) A m’. Then, add s to REACH, where s’ € MBP({X, X°}, w). 


MustPredecessor If there is (m,i + 1) € Q, and a model M s.t. M | w, where 
w = F(Fi, REACH) A m’. Then, add s to Q, where s € MBP({X°, X'}, w). 


MayPredecessor If there is (m,i + 1) € Q and a model M s.t. M | w, where 
p = F(Fi) Am’. Then, add s to Q, where s° € MBP({X, X’}, w). 

NewLemma If there is an (m,i + 1) € Q, s.t. F(F;) Am’ is unsatisfiable. Then, add 
p = ITP(F(Fi), m’) to Fj, for lO <j <i+1. 

ReQueue If (m,i) € Q, 0 <i< N and F(Fi_-1) Am’ is unsatisfiable, then add 
(m,i+1) to Q. 

Push For 0<i< N and a clause (pVw) € Fi, if p ¢ Fi+1, F(pA Fi) > ¢’, then 
add ọ to Fy, for all j <7+1. 


until oo; 
Algorithm 1: Rule-based description of SPACER. 


We present SPACER [27] as a set of rules shown in Algorithm 1. While the 
algorithm is sound under any order on application of the rules, it is easy to see 
that only some orders lead to progress. Since solving CHCs even over LIA is unde- 
cidable, we are only concerned with soundness and progress, and do not discuss 
termination. The algorithm is based on the core principles of IC3 [5], however, 
it differs significantly in the details. The rules Unreachable and Reachable 
detect termination, either by discovering an inductive solution, or by discovering 
existence of a refutation, respectively. Unfold increases the exploration depth, 
and Candidate constructs a new proof obligation based on the current depth 
and the set Bad of bad states. Successor computes additional reachable states, 
that is, an under-approximation of the model of the implicit predicate P. Note 
that it used Model Based Projection to under-approximate forward predicate 
transformer. The rules MustPredecessor and MayPredecessor compute a 
new proof obligation that precedes an existing one. MustPredecessor does 
the computation based on existing reachable states, while MayPredecessor 
makes a guess based on existing over-approximation of P. In this case, MBP is 
used again, but now to under-approximate a backward predicate transformer. 
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The rule NewLemma computes a new over-approximation, called a lemma, of 
what is derivable about P in i+ 1 by blocking a proof obligation. This is very 
similar to the corresponding step in IC3. Note, however, that interpolation is 
used to generalize the learned lemma beyond the literals of the proof obligation. 
ReQueue allows pushing blocked proof obligations to higher level, and Push 
allows pushing and inductively generalizing lemmas. 

SPACER was introduced in [27]. Extension for convex linear arithmetic (i.e., 
discovering convex and co-convex solutions) is described in [3]. Support for 
quantifier free solutions for CHC over the combined theories of arrays and 
arithmetic is described in [26]. Extension for quantified solutions, which are 
necessary for establishing interesting properties when arrays are involved is 
described in [20]. More recently, the interpolation for lemma-generalization has 
been replaced by more global guidance [14]. This made SPACER competitive with 
other data-driven approaches that infer new lemmas based on numerical values 
of blocked counterexamples. Machine Learning-based inductive generalization 
has been suggested in [29]. The solver has also been extended to support Alge- 
braic Data Types and Recursive Functions [16]. Work on improving support for 
bit-vectors [15] and experimenting with support for uninterpreted functions is 
ongoing. 
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Abstract. Morgan and Mclver’s weakest pre-expectation framework is 
one of the most well-established methods for deductive verification of 
probabilistic programs. Roughly, the idea is to generalize binary state 
assertions to real-valued expectations, which can measure expected val- 
ues of probabilistic program quantities. While loop-free programs can 
be analyzed by mechanically transforming expectations, verifying loops 
usually requires finding an invariant expectation, a difficult task. 

We propose a new view of invariant expectation synthesis as a regres- 
sion problem: given an input state, predict the average value of the 
post-expectation in the output distribution. Guided by this perspective, 
we develop the first data-driven invariant synthesis method for proba- 
bilistic programs. Unlike prior work on probabilistic invariant inference, 
our approach can learn piecewise continuous invariants without relying 
on template expectations. We also develop a data-driven approach to 
learn sub-invariants from data, which can be used to upper- or lower- 
bound expected values. We implement our approaches and demonstrate 
their effectiveness on a variety of benchmarks from the probabilistic pro- 
gramming literature. 


Keywords: Probabilistic programs - Data-driven invariant learning - 
Weakest pre-expectations 


1 Introduction 


Probabilistic programs—standard imperative programs augmented with a sam- 
pling command—are a common way to express randomized computations. While 
the mathematical semantics of such programs is fairly well-understood [25], ver- 
ification methods remain an active area of research. Existing automated tech- 
niques are either limited to specific properties (e.g., [3,9,35,37]), or target simpler 
computational models [4, 15, 28]. 
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Reasoning About Expectations. One of the earliest methods for reasoning 
about probabilistic programs is through expectations. Originally proposed by 
Kozen [26], expectations generalize standard, binary assertions to quantitative, 
real-valued functions on program states. Morgan and Mclver further developed 
this idea into a powerful framework for reasoning about probabilistic imperative 
programs, called the weakest pre-expectation calculus [30,33]. 

Concretely, Morgan and McIver defined an operator called the weakest pre- 
expectation (wpe), which takes an expectation E and a program P and produces 
an expectation E’ such that E’(c) is the expected value of E in the output 
distribution | P].,. In this way, the wpe operator can be viewed as a generalization 
of Dijkstra’s weakest pre-conditions calculus [16] to probabilistic programs. For 
verification purposes, the wpe operator has two key strengths. First, it enables 
reasoning about probabilities and expected values. Second, when P is a loop-free 
program, it is possible to transform wpe(P, E) into a form that does not mention 
the program P via simple, mechanical manipulations, essentially analyzing the 
effect of the program on the expectation through syntactically transforming EF. 

However, there is a caveat: the wpe of a loop is defined as a least fixed 
point, and it is generally difficult to simplify this quantity into a more tractable 
form. Fortunately, the wpe operator satisfies a loop rule that simplifies reasoning 
about loops: if we can find an expectation J satisfying an invariant condition, 
then we can easily bound the wpe of a loop. Checking the invariant condition 
involves analyzing just the body of the loop, rather than the entire loop. Thus, 
finding invariants is a primary bottleneck towards automated reasoning about 
probabilistic programs. 


Discovering Invariants. Two recent works have considered how to automatically 
infer invariant expectations for probabilistic loops. The first is PRINSys [21]. 
Using a template with one hole, PRINSYS produces a first-order logical formula 
describing possible substitutions satisfying the invariant condition. While effec- 
tive for their benchmark programs, the method’s reliance on templates is limit- 
ing; furthermore, the user must manually solve a system of logical formulas to 
find the invariant. 

The second work, by Chen et al. [14], focuses on inferring polynomial invari- 
ants. By restricting to this class, their method can avoid templates and can apply 
the Lagrange interpolation theorem to find a polynomial invariant. However, 
many invariants are not polynomials: for instance, an invariant may combine 
two polynomials piecewise by branching on a Boolean condition. 


Our Approach: Invariant Learning. We take a different approach inspired by 
data-driven invariant learning [17,19]. In these methods, the program is exe- 
cuted with a variety of inputs to produce a set of execution traces. This data 
is viewed as a training set, and a machine learning algorithm is used to find a 
classifier describing the invariant. Data-driven techniques reduce the reliance on 
templates, and can treat the program as a black box—the precise implementa- 
tion of the program need not be known, as long as the learner can execute the 
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program to gather input and output data. But to extend the data-driven method 
to the probabilistic setting, there are a few key challenges: 


— Quantitative invariants. While the logic of expectations resembles the logic 
of standard assertions, an important difference is that expectations are quan- 
titative: they map program states to real numbers, not a binary yes/no. While 
standard invariant learning is a classification task (i.e., predicting a binary 
label given a program state), our probabilistic invariant learning is closer to 
a regression task (i.e., predicting a number given a program state). 

— Stochastic data. Standard invariant learning assumes the program behaves 
like a function: a given input state always leads to the same output state. In 
contrast, a probabilistic program takes an input state to a distribution over 
outputs. Since we are only able to observe a single draw from the output 
distribution each time we run the program, execution traces in our setting 
are inherently noisy. Accordingly, we cannot hope to learn an invariant that 
fits the observed data perfectly, even if the program has an invariant—our 
learner must be robust to noisy training data. 

— Complex learning objective. To fit a probabilistic invariant to data, the 
logical constraints defining an invariant must be converted into a regression 
problem with a loss function suitable for standard machine learning algo- 
rithms and models. While typical regression problems relate the unknown 
quantity to be learned to known data, the conditions defining invariants are 
somehow self-referential: they describe how an unknown invariant must be 
related to itself. This feature makes casting invariant learning as machine 
learning a difficult task. 


Outline. After covering preliminaries (Sect.2), we present our contributions. 


— A general method called Exist for learning invariants for probabilistic pro- 
grams (Sect. 3). EXIST executes the program multiple times on a set of input 
states, and then uses machine learning algorithms to learn models encod- 
ing possible invariants. A CEGIS-like loop is used to iteratively expand the 
dataset after encountering incorrect candidate invariants. 

— Concrete instantiations of EXIST tailored for handling two problems: learning 
exact invariants (Sect.4), and learning sub-invariants (Sect.5). Our method 
for exact invariants learns a model tree [34], a generalization of binary decision 
trees to regression. The constraints for sub-invariants are more difficult to 
encode as a regression problem, and our method learns a neural model tree [41] 
with a custom loss function. While the models differ, both algorithms leverage 
off-the-shelf learning algorithms. 

— An implementation of EXIST and a thorough evaluation on a large set of 
benchmarks (Sect.6). Our tool can learn invariants and sub-invariants for 
examples considered in prior work and new, more difficult versions that are 
beyond the reach of prior work. 


We discuss related work in Sect. 7. 
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2 Preliminaries 


Probabilistic Programs. We will consider programs written in pWhile, a basic 
probabilistic imperative language with the following grammar: 


P := skip |x — e| x <+ d|P ;P |if ethen P else P | while e : P, 


where e is a boolean or numerical expression. All commands P map memories to 
distributions over memories [25], and the semantics is entirely standard and can 
be found in the extended version. We write |P]- for the output distribution of 
program P from initial state ø. Since we will be interested in running programs 
on concrete inputs, we will assume throughout that all loops are almost surely 
terminating; this property can often be established by other methods (e.g., [12, 
13,31]). 


Weakest Pre-expectation Calculus. Morgan and Mclver’s weakest pre-expectation 
calculus reasons about probabilistic programs by manipulating expectations. 


Definition 1. Denote the set of program states by X. Define the set of expec- 
tations, E, to be {E | E : X > R% }. Define Ey < Ey iff Yo €X: Elo) < 
E>(0). The set E is a complete lattice. 


While expectations are technically mathematical functions from X to the non- 
negative extended reals, for formal reasoning it is convenient to work with a 
more restricted syntax of expectations (see, e.g., [8]). We will often view numeric 
expressions as expectations. Boolean expressions b can also be converted to 
expectations; we let [b] be the expectation that maps states where b holds to 
1, and other states to 0. As an example of our notation, [flip = 0] - (x +1), +1 
are two expectations, and we have [flip = 0]-(a@+1)<a+1. 


wpe(skip, E) := E 
wpe(z <— e, E) = Ele/z] 
wpe(z & d, E) := ào. X ld] (v) - Elv/a] 
vev 
wpe(P ; Q, E) := wpe(P, wpe(Q, E)) 
wpe(if e then P else Q, E) := [e] - wpe(P, E) + [Fe] - wpe(Q, E) 
wpe(while e : P, E) := Ifp(AX. [e] - wpe(P, X) + [>e] - E£) 


Fig. 1. Morgan and Mclver’s weakest pre-expectation operator 


Now, we are ready to introduce Morgan and Mclver’s weakest pre-expectation 
transformer wpe. In a nutshell, this operator takes a program P and an expecta- 
tion E to another expectation E’, sometimes called the pre-expectation. Formally, 
wpe is defined in Fig. 1. The case for loops involves the least fixed-point (Ifp) of 
PRS = AX.([e] - wpe(P, X) + [>e] - E), the characteristic function of the loop 
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with respect to wpe [23]. The characteristic function is monotone on the complete 
lattice E, so the least fixed-point exists by the Kleene fixed-point theorem. 

The key property of the wpe transformer is that for any program P, 
wpe(P, £)(c) is the expected value of E over the output distribution [P].. 


Theorem 1 (See, e.g., [23]). For any program P and expectation E € €, 
wpe(P, E) = ro. ges E(o’) - [P]o(o’) 

Intuitively, the weakest pre-expectation calculus provides a syntactic way to 
compute the expected value of an expression E after running a program P, 
except when the program is a loop. For a loop, the least fixed point definition 
of wpe(while e : P, E) is hard to compute. 


3 Algorithm Overview 


In this section, we introduce the two related problems we aim to solve, and 
a meta-algorithm to tackle both of them. We will see how to instantiate the 
meta-algorithm’s subroutines in Sect. 4 and Sect. 5. 


Problem Statement. Analogous to when analyzing the weakest pre-conditions of 
a loop, knowing a loop invariant or sub-invariant expectation enables one to eas- 
ily bound the loop’s weakest pre-expectations, but a (sub)invariant expectation 
can be difficult to find. Thus, we aim to develop an algorithm to automatically 
synthesize invariants and sub-invariants of probabilistic loops. More specifically, 
our algorithm tackles the following two problems: 


1. Finding exact invariants: Given a loop while G : P and an expectation 
postE as input, we want to find an expectation J such that 
I = @0"" (I) := [G] - wpe(P, I) + [>G] - postE. (1) 


postE 


Such an expectation J is an exact invariant of the loop with respect 
to postE. Since wpe(while G : P,postE) is a fixed point of Ofte, 
wpe(while G : P, postE) has to be an exact invariant of the loop. Furthermore, 
when while G : P is almost surely terminating and postE is upper-bounded, 
the existence of an exact invariant I implies J = wpe(while e : P, E). (We 
defer the proof to the extended version.) 
2. Finding sub-invariants: Given a loop while G : P and expectations 
preE, postE, we aim to learn an expectation I such that 
I < BPS (I) = [G] - wpe(P, I) + [-G] - postE (2) 


postE 


preE < T. (3) 


The first inequality says that I is a sub-invariant: on states that satisfy G, 
the value of J lower bounds the expected value of itself after running one 
loop iteration from initial state, and on states that violate G, the value of I 
lower bounds the value of postE. Any sub-invariant lower-bounds the weakest 
pre-expectation of the loop, i.e., J < wpe(while G : P, E) [22]. Together with 
the second inequality preE < J, the existence of a sub-invariant J ensures that 
preE lower-bounds the weakest pre-expectation. 
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Note that an exact invariant is a sub-invariant, so one indirect way to solve the 
second problem is to solve the first problem, and then check preE < J. However, 
we aim to find a more direct approach to solve the second problem because often 
exact invariants can be complicated and hard to find, while sub-invariants can 
be simpler and easier to find. 


EXIST(geo, perp, Nruns, Nstates)! 
feat + getFeatures(geo, pezp) 
states < sampleStates(feat, Nstates ) 
data «+ sampleTraces(geo, pezp, feat, Nruns, states) 
while not timed out: 
models + learnlnv( feat, data) 
candidates < extractInv(models) 
for inv in candidates: 
verified, cex + verifylnv(inv, geo) 
if verified: 
return inv 
else: 
states <— states U cex 
states + states U sampleStates(feat, Nitrates) 


data « data U sampleTraces(geo, pexp, feat, nruns, states) 


Fig. 2. Algorithm EXIST 


Methods. We solve both problems with one algorithm, EXIST (short for EXpec- 
tation Invariant SynThesis). Our data-driven method resembles Counterexam- 
ple Guided Inductive Synthesis (CEGIS), but differs in two ways. First, can- 
didates are synthesized by fitting a machine learning model to data consisted 
of program traces starting from random input states. Our target programs are 
also probabilistic, introducing a second source of randomness to program traces. 
Second, our approach seeks high-quality counterexamples—violating the target 
constraints as much as possible—in order to improve synthesis. For synthesizing 
invariants and sub-invariants, such counterexamples can be generated by using 
a computer algebra system to solve an optimization problem. 

We present the pseudocode in Fig. 2. EXIST takes a probabilistic program geo, 
a post-expectation or a pair of pre/post-expectation perp, and hyper-parameters 
Nruns and Netates. EXIST starts by generating a list of features feat, which are 
numerical expressions formed by program variables used in geo. Next, EXIST 
samples Nstates initialization states and runs geo from each of those states for 
Nreuns trials, and records the value of feat on program traces as data. Then, EXIST 
enters a CEGIS loop. In each iteration of the loop, first the learner learnInv trains 
models to minimize their violation of the required inequalities (e.g., Eqs. (2) 
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and (3) for learning sub-invariants) on data. Next, extractInv translates learned 
models into a set candidates of expectations. For each candidate inv, the veri- 
fier verifylnv looks for program states that maximize inv’s violation of required 
inequalities. If it cannot find any program state where inv violates the inequal- 
ities, the verifier returns inv as a valid invariant or sub-invariant. Otherwise, it 
produces a set cex of counter-example program states, which are added to the 
set of initial states. Finally, before entering the next iteration, the algorithm aug- 
ments states with a new batch of N’ sates initial states, generates trace data from 
running geo on each of these states for Nruns trials, and augments the dataset 
data. This data augmentation ensures that the synthesis algorithm collects more 
and more initial states, some randomly generated (sampleStates) and some from 
prior counterexamples (cez), guiding the learner towards better candidates. Like 
other CEGIS-based tools, our method is sound but not complete, i.e., if the algo- 
rithm returns an expectation then it is guaranteed to be an exact invariant or 
sub-invariant, but the algorithm might never return an answer; in practice, we 
set a timeout. 


4 Learning Exact Invariants 


In this section, we detail how we instantiate EXIST’s subroutines to learn an 
exact invariant I satisfying J = ®)%,-(I), given a loop geo and an expectation 
perp = postE. 

At a high level, we first sample a set of program states states using 
sampleStates. From each program state s € states, sampleTraces executes geo 
and estimates wpe(geo, postE)(s). Next, learnInv trains regression models M to 
predict the estimated wpe(geo, postE)(s) given the value of features evaluated on 
s. Then, extractInv translates the learned models M to an expectation J. In an 
ideal scenario, this J would be equal to wpe(geo, postE), which is also always an 
exact invariant. But since J is learned from stochastic data, it may be noisy. So, 
we use verifylnv to check whether J satisfies the invariant condition I = ®°%--(J). 

The reader may wonder why we took this complicated approach, first estimat- 
ing the weakest pre-expectation of the loop, and then computing the invariant: 
If we are able to learn an expression for wpe(geo, postE) directly, then why are 
we interested in the invariant J? The answer is that with an invariant J, we can 
also verify that our computed value of wpe(prog, postE) is correct by checking 
the invariant condition and applying the loop rule. Since our learning process is 
inherently noisy, this verification step is crucial and motivates why we want to 
find an invariant. 
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x = 0? x = 0? 
while x =0: wa Dm 4 ~~ 
nent; 
x & Bernoulli(p) ý ma 7 ý aa 0.955, 
(a) Program: geo (b) Model tree for wpe(geo,n) (c) Another model tree 


Fig. 3. Running example: program and model tree 


A Running Example. We will illustrate our approach using Fig. 3. The simple 
program geo repeatedly loops: whenever x becomes non-zero we exit the loop; 
otherwise we increase n by 1 and draw x from a biased coin-flip distribution (x 
gets 1 with probability p, and 0 otherwise). We aim to learn wpe(geo, n), which 
is [x AO] -n+ [x =O]: (n+ 5). 


Our Regression Model. Before getting into how EXIST collects data and trains 
models, we introduce the class of regression models it uses — model trees, a 
generalization of decision trees to regression tasks [34]. Model trees are naturally 
suited to expressing piecewise functions of inputs, and are straightforward to 
train. While our method can in theory generalize to other regression models, our 
implementation focuses on model trees. 

More formally, a model tree T € T over features F is a full binary tree where 
each internal node is labeled with a predicate @ over variables from F, and each 
leaf is labeled with a real-valued model M € M : RF — R. Given a feature 
vector in x € RF, a model tree T over F produces a numerical output T(x) € R 
as follows: 


— If T is of the form Leaf( M), then T(x) := M(x). 
— If T is of the form Node(¢, T,,TR), then T(x) := T(x) if the predicate ¢ 
evaluates to true on x, and T(x) := Tr (x) otherwise. 


Throughout this paper, we consider model trees of the following form as our 
regression model. First, node predicates @ are of the form f x c, where f € F 
is a feature, x € {<,<,=,>,>} is a comparison, and c is a numeric constant. 
Second, leaf models on a model tree are either all linear models or all products 
of constant powers of features, which we call multiplication models. For example, 
assuming n, ; are both features, Fig. 3b and c are two model trees with linear leaf 
models, and Fig. 3b expresses the weakest pre-expectation wpe(geo, n). Formally, 
the leaf model M on a feature vector f is either 


|F| |F] 


M= aichi oœ MaA =[] ft 
i=l {=l 


with constants {a;};. Note that multiplication models can also be viewed as 


linear models on logarithmic values of features because log M,,(f) = +, Qi: 
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log( fi). While it is also straightforward to adapt our method to other leaf models, 
we focus on linear models and multiplication models because of their simplicity 
and expressiveness. Linear models and multiplication models also complement 
each other in their expressiveness: encoding expressions like x + y uses simpler 
features with linear models (it suffices if F 3 x, y, as opposed to needing F 3 x+y 
if using multiplicative models), while encoding = uses simpler features with 
multiplicative models (it suffices if F 5 p,1—p, as opposed to needing F 3 er 
if using linear models). 


4.1 Generate Features (getFeatures) 


Given a program, the algorithm first generates a set of features F that model 
trees can use to express unknown invariants of the given loop. For example, for 
geo, I = [x # 0] -n+ [x = 0] - (n+ 5) is an invariant, and to have a model 
tree (with linear/multiplication leaf models) express J, we want F to include 
both n and z or n + t as one feature. F should include the program variables 
at a minimum, but it is often useful to have more complex features too. While 
generating more features increases the expressivity of the models, and richness 
of the invariants, there is a cost: the more features in F, the more data is needed 
to train a model. 

Starting from the program variables, getFeatures generates two lists of fea- 
tures, F; for linear leaf models and Fm for multiplication leaf models. Intuitively, 
linear models are more expressive if the feature set F includes some products of 
terms, e.g., n: pt, and multiplication models are more expressive if F includes 
some sums of terms, e.g., n+ 1. 


4.2 Sample Initial States (sampleStates) 


Recall that EXIST aims to learn an expectation J that is equal to the weakest 
pre-expectation wpe(while G : P, postE). A natural idea for sampleTraces is to 
run the program from all possible initializations multiple times, and record the 
average value of postE from each initialization. This would give a map close to 
wpe(while G : P, postE) if we run enough trials so that the empirical mean is 
approximately the actual mean. However, this strategy is clearly impractical— 
many of the programs we consider have infinitely many possible initial states 
(e.g., programs with integer variables). Thus, sampleStates needs to choose a 
manageable number of initial states for sampleTraces to use. 

In principle, a good choice of initializations should exercise as many parts 
of the program as possible. For instance, for geo in Fig. 3, if we only try initial 
states satisfying x Æ 0, then it is impossible to learn the term [x = 0] - (n + 5) 
in wpe(geo, n) from data. However, covering the control flow graph may not be 
enough. Ideally, to learn how the expected value of postE depends on the initial 
state, we also want data from multiple initial states along each path. 

While it is unclear how to choose initializations to ensure optimal coverage, 
our implementation uses a simpler strategy: sampleStates generates Nstates states 
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in total, each by sampling the value of every program variable uniformly at 
random from a space. We assume program variables are typed as booleans, 
integers, probabilities, or floating point numbers and sample variables of some 
type from the corresponding space. For boolean variables, the sampling space is 
simply {0,1}; for probability variables, the space includes reals in some interval 
bounded away from 0 and 1, because probabilities too close to 0 or 1 tend to 
increase the variance of programs (e.g., making some loops iterate for a very long 
time); for floating point number and integer variables, the spaces are respectively 
reals and integers in some bounded range. This strategy, while simple, is already 
very effective in nearly all of our benchmarks (see Sect. 6), though other strategies 
are certainly possible (e.g., performing a grid search of initial states from some 
space). 


4.3 Sample Training Data (sampleTraces) 


We gather training data by running the given program geo on the set of initializa- 
tions generated by sampleStates. From each program state s € states, the subrou- 
tine sampleTraces runs geo for N,uns times to get output states {s1,...,5N,,,.} 
and produces the following training example: 


1 
(si, vi) = (s Naw 


Above, the value v; is the empirical mean of postE in the output state of running 
geo from initial state s;; as Nruns grows large, this average value approaches the 
true expected value wpe(geo, postE)(s). 


Nruns 


5 poeta) : 


i=l 


4.4 Learning a Model Tree (learnInv) 


Now that we have the training set data = {(51,v1),...,(8K,UK)} (where K = 
Netates), We want to fit a model tree T to the data. We aim to apply off-the- 
shelf tools that can learn model trees with customizable leaf models and loss. 
For each data entry, v; approximates wpe(geo, postE)(s;), so a natural idea is to 
train a model tree T that takes the value of features on s; as input and predicts 
vi. To achieve that, we want to define the loss to measure the error between 
predicted values T(F;(s;)) (or T(Fim(s;))) and the target value v;. Without loss 
of generality, we can assume our invariant I is of the form 


I = postE + [G] -T (4) 
because J being an invariant means 
I = [AG] - postE + [G] - wpe(P, I) = postE + [G] - (wpe(P, I) — postE). 


In many cases, the expectation I’ = wpe(P, I) — postE is simpler than J: for 


example, the weakest pre-expectation of geo can be expressed as n+ |x = 0]- G) 
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while I is represented by a tree that splits on the predicate [|x = 0] and needs 
both n, ; as features, the expectation I’ = z is represented by a single leaf model 
tree that only needs p as a feature. 

Aiming to learn weakest pre-expectations I in the form of Eq. (4), EXIST 
trains model trees T to fit I’. More precisely, learnInv trains a model tree T; with 
linear leaf models over features F; by minimizing the loss 


K 


1/2 
errı(T,, data) = (>: (postE(s;) + G(s;) - T(Fi(si)) — n”) i (5) 


i=l 


where postE(s;) and G(s;) represents the value of expectation postE and G eval- 
uated on the state s;. This loss measures the sum error between the prediction 
postE(s;) + G(s;) - Tı(Fı(sı)) and target v;. Note that when the guard G is 
false on an initial state s;, the example contributes zero to the loss because 
postE(s;) + G(s;) - Ti(Fi(si)) = postE(s;) = vi; thus, we only need to generate 
and collect trace data for initial states where the guard G is true. 

Analogously, learnInv trains a model tree Tm with multiplication leaf models 
over features Fm to minimize the loss errm(Tm, data), which is the same as 
errı(T,, data) except T;(Fi(s;)) is replaced by T,,(Fim(s;)) for each i. 


4.5 Extracting Expectations from Models (extractInv) 


Given the learned model trees T; and Tm, we extract expectations that approx- 
imate wpe(geo, postE) in three steps: 


1. Round T,, Tm with different precisions. Since we obtain the model trees 
T, and Tm by learning and the training data is stochastic, the coefficients of 
features in 7; and Tm may be slightly off. We apply several rounding schemes 
to generate a list of rounded model trees. 

2. Translate into expectations. Since we learn model trees, this step is 
straightforward: for example, n + i can be seen as a model tree (with only 
a leaf) mapping the values of features n, 3 to a number, or an expectation 
mapping program states where n,p are program variables to a number. We 
translate each model tree obtained from the previous step to an expectation. 

3. Form the candidate invariant. Since we train the model trees to fit I’ so 
that postE + [G]- J’ approximates wpe(while G : P, postE), we construct each 
candidate invariant inv € inus by replacing I’ in the pattern postE + [G] - I’ 
by an expectation obtained in the second step. 


4.6 Verify Extracted Expectations (verifylnv) 


Recall that geo is a loop while G : P, and given a set of candidate invariants 
inus, we want to check if any inv € invs is a loop invariant, i.e., if inv satisfies 


inv = [AG] - postE + [G] - wpe(P, inv). (6) 
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Since the learned model might not predict the expected value for every data 
point exactly, we must verify whether inv satisfies this equality using verifylnv. 
If not, verifylnv looks for counterexamples that maximize the violation in order 
to drive the learning process forward in the next iteration. Formally, for every 
inv € inus, verifylnv queries computer algebra systems to find a set of program 
states S such that S includes states maximizing the absolute difference of two 
sides in Eq. (6): 


S > argmax,|inu(s) — ([=G] - postE + [G] - wp(P, inv)) (s)|. 


If there are no program state where the absolute difference is non-zero, verify- 
Inv returns inv as a true invariant. Otherwise, the maximizing states in S are 
added to the list of counterexamples cexz; if no candidate in invs is verified, 
verifylnv returns False and the accumulated list of counterexamples cex. The 
next iteration of the CEGIS loop will sample program traces starting from these 
counterexample initial states, hopefully leading to a learned model with less 
error. 


5 Learning Sub-invariants 


Next, we instantiate EXIST for our second problem: learning sub-invariants. 
Given a program geo = while G: P and a pair of pre- and post- expectations 
(preE, postE), we want to find a expectation I such that preE < J, and 


I < fose) := [>G] - postE + [G] - wpe(P, T) 


Intuitively, Posel ) computes the expected value of the expectation I after one 


iteration of the loop. We want to train a model M such that M translates to an 
expectation I whose expected value decrease each iteration, and preE < I. 

The high-level plan is the same as for learning exact invariants: we train 
a model to minimize a loss defined to capture the sub-invariant requirements. 
We generate features F and sample initializations states as before. Then, from 
each s € states, we repeatedly run just the loop body P and record the set of 
output states in data; this departs from our method for exact invariants, which 
repeatedly runs the entire loop to completion. Given this trace data, for any 
program state s € states and expectation J, we can compute the empirical mean 
of I’s value after running the loop body P on state s. Thus, we can approximate 
wpe(P,1)(s) for s € states and use this estimate to approximate O*P*.(I)(s). 


ostE 
We then define a loss to sum up the violation of I < @°F-(I) and preE <Ion 
state s € states, estimated based on the collected data. 

The main challenge for our approach is that existing model tree learning algo- 
rithms do not support our loss function. Roughly speaking, model tree learners 
typically assume a node’s two child subtrees can be learned separately; this is 
the case when optimizing on the loss we used for exact invariants, but this is not 


the case for the loss for sub-invariants. 
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To solve this challenge, we first broaden the class of models to neural net- 
works. To produce sub-invariants that can be verified, we still want to learn 
simple classes of models, such as piecewise functions of numerical expressions. 
Accordingly, we work with a class of neural architectures that can be translated 
into model trees, neural model trees, adapted from neural decision trees devel- 
oped by Yang et al. [41]. We defer the technical details of neural model trees to 
the extended version, but for now, we can treat them as differentiable approxi- 
mations of standard model trees; since they are differentiable they can be learned 
with gradient descent, which can support the sub-invariant loss function. 


Outline. We will discuss changes in sampleTraces, learnInv and verifylnv for learn- 
ing sub-invariants but omit descriptions of getFeatures, sampleStates, extractInv 
because EXIST generates features, samples initial states and extracts expecta- 
tions in the same way as in Sect. 4. To simplify the exposition, we will assume 
getFeatures generates the same set of features F = Fı = Fm for model trees with 
linear models and model trees with multiplication models. 


5.1 Sample Training Data (sampleTraces) 


Unlike when sampling data for learning exact invariants, here, sampleTraces runs 
only one iteration of the given program geo = while G : P, that is, just P, instead 
of running the whole loop. Intuitively, this difference in data collection is because 
we aim to directly handle the sub-invariant condition, which encodes a single 
iteration of the loop. For exact invariants, our approach proceeded indirectly by 
learning the expected value of postE after running the loop to termination. 

From any initialization s; € states such that G holds on s;, sampleTraces 
runs the loop body P for N,uns trials, each time restarting from s;, and records 
the set of output states reached. If executing P from s; leads to output states 
{si1,..., Sin,,,,, J, then sampleTraces produces the training example: 


(si, Si) = (Si, {si1, Dane | SiNuns }) ’ 


For initialization s; € states such that G is false on s;, sampleTraces simply 
produces (s;, S1) = (si, Ø) since the loop body is not executed. 


5.2 Learning a Neural Model Tree (learnInv) 


Given the dataset data = {(51,51),...,(sx,SK)} (with K = Netates), we want 
to learn an expectation J such that preE < I and I < "P$ (7). By case analysis 


postE 
on the guard G, the requirement I < Postell ) can be split into two constraints: 


[G] -I < [G] - wpe(P, T) and [>G] - I < [>G] - postE. 


If I = postE + [G] - I’, then the second requirement reduces to [>G] - postE < 
[=G] - postE and is always satisfied. So to simplify the loss and training process, 
we again aim to learn an expectation J of the form of postE + [G] - I’. Thus, we 
want to train a model tree T such that T translates into an expectation J’, and 
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preE < postE + [G] - I’ (7) 


[G] - (postE + [G] - I’) < [G] - wpe(P, postE + [G] - I’) (8) 
Then, we define the loss of model tree T on data to be 
err(T, data) := errı (T, data) + err2(T, data), 


where errı(T, data) captures Eq. (7) and err2(T, data) captures Eq. (8). 
Defining err, is relatively simple: we sum up the one-sided difference between 

preE(s) and postE(s) + G(s) - T(F(s)) across s € states, where T is the model 

tree getting trained and F(s) is the feature vector F evaluated on s. That is, 


K 
err,(T, data) := 5 max (0, preE(s;) — postE(s;) — G(s;)-T(F(s;))). (9) 
i=1 
Above, preE(s;), postE(s;), and G(s;) are the value of expectations preE, postE, 
and G evaluated on program state si. 

The term err is more involved. Similar to errı, we aim to sum up the one- 
sided difference between two sides of Eq. (8) across state s € states. On program 
state s that does not satisfy G, both sides are 0; for s that satisfies G, we want 
to evaluate wpe(P, postE + [G] - I’) on s, but we do not have exact access to 
wpe(P, postE + [G]- J’) and need to approximate its value on s based on sampled 
program traces. Recall that wpe(P, I)(s) is the expected value of I after running 
program P from s, and our dataset contains training examples (s;,5;) where 
S; is a set of states reached after running P on an initial state s; satisfying G. 
Thus, we can approximate |G] - wpe(P, postE + G - I’)(s;) by 


G(si) - 5 . 5 (postE(s) + G(s) - I’(s)). 
ses; 
To avoid division by zero when s; does not satisfy G and S; is empty, we evaluate 
the expression in a short-circuit manner such that when G(s;) = 0, the whole 
expression is immediately evaluated to zero. 
Therefore, we define 


erra(T, data) = Somes (o G(s;) - postE(s;) + G(s;) - T(F(s;)) 


— G(s;) - 5 . 5 (postE(s) + G(s) - T(F(s))). 
il seS; 

Standard model tree learning algorithms do not support this kind of loss func- 
tion, and since our overall loss err(T, data) is the sum of errı(T, data) and 
erro(T, data), we cannot use standard model tree learning algorithm to opti- 
mize err(T, data) either. Fortunately, gradient descent does support this loss 
function. While gradient descent cannot directly learn model trees, we can use 
gradient descent to train a neural model tree T to minimize err(T, data). The 
learned neural networks can be converted to model trees, and then converted to 
expectations as before. (See discussion in the extended version.) 
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5.3 Verify Extracted Expectations (verifylnv) 


The verifier verifylnv is very similar to the one in Sect. 4 except here it solves a 
different optimization problem. For each candidate inv in the given list invs, it 
looks for a set S of program states such that S includes 


argmax,preE(s) — inv(s) and argmax,G(s)- I(s)— [G] -wpe(P,I)(s). 


As in our approach for exact invariant learning, the verifier aims to find coun- 
terexample states s that violate at least one of these constraints by as large 
of a margin as possible; these high-quality counterexamples guide data col- 
lection in the following iteration of the CEGIS loop. Concretely, the verifier 
accepts inv if it cannot find any program state s where preE(s) — inv(s) or 
G(s)-I(s) —[G]-wpe(P, I)(s) is positive. Otherwise, it adds all states s € S with 
strictly positive margin to the set of counterexamples cez. 


6 Evaluations 


We implemented our prototype in Python, using sklearn and tensorflow to fit 
model trees and neural model trees, and Wolfram Alpha to verify and perform 
counterexample generation. We have evaluated our tool on a set of 18 bench- 
marks drawn from different sources in prior work [14,21,24]. Our experiments 
were designed to address the following research questions: 


R1. Can EXIST synthesize exact invariants for a variety of programs? 
R2. Can EXIST synthesize sub-invariants for a variety of programs? 


We summarize our findings as follows: 


— EXIST successfully synthesized and verified exact invariants for 14/18 bench- 
marks within a timeout of 300s. Our tool was able to generate these 14 
invariants in reasonable time, taking between 1 to 237s. The sampling phase 
dominates the time in most cases. We also compare EXIST with a tool from 
prior literature, MORA [7]. We found that Mora can only handle a restrictive 
set of programs and cannot handle many of our benchmarks. We also discuss 
how our work compares with a few others in (Sect. 7). 

— To evaluate sub-invariant learning, we created multiple problem instances 
for each benchmark by supplying different pre-expectations. On a total of 34 
such problem instances, EXIST was able to infer correct invariants in 27 cases, 
taking between 7 to 102s. 


We present in the extended version the tables of complete experimental results. 
Because the training data we collect are inherently stochastic, the results pro- 
duced by our tool are not deterministic.! As expected, sometimes different trials 
on the same benchmarks generate different sub-invariants; while the exact invari- 
ant for each benchmark is unique, EXIST may also generate semantically equiv- 
alent but syntactically different expectations in different trials (e.g. it happens 
for BiasDir). 


1 The code and data sampled in the trial that produced the tables in this paper can 
be found at https://github.com/JialuJialu/Exist. 
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Table 1. Exact Invariants generated by EXIST 
Name postE Learned Invariant = ST LT VT TT 


Bin1 n z+[n <M]: (M-p-n-p) 25.67 12.03 0.22 37.91 


Fair count (count + [cl + c2 0]; (pl + p2)/(p1 + p2 — pl- p2)) 5.78 1.62 0.30 7.69 


Gambler z z + [x > 0 and y > z}: æ- (y — z) 112.02 3.52 9.97 125.51 
Geo0 z z + [flip == 0] - (1 — p1) /pı 12.01 0.85 2.65 15.51 
Sum0 z z+ [n > 0]: (0.5: p-n? +0.5- p-n) 102.12 34.61 26.74 163.48 


Table 2. Sub-invariants generated by EXIST 


Gambler z x- (y -— zx) z+ [x > 0 and y > z]}-æx-(y—x) 7.31 28.87 8.29 44.46 

Geo0 z [flip == 0] - (1 — p1 )) z + [flip == 0]; (1 — pı) 8.70 26.13 0.19 35.02 

A z+[n>0]-2 [n > 0] -(n +1) 53.72 30.01 0.35 84.98 
LinExp z 

z+[n>0]-2-n z+[n>0]-2-n 29.18 28.61 0.68 58.48 

g z+[z>0]-z z + [z > 0] -z/p 18.17 71.15 2.17 91.55 
RevBin z 

z z 15.62 18.74 0.06 34.42 


Implementation Details. For input parameters to EXIST, we use Npuns = 500 
and Nstates = 500. Besides input parameters listed in Fig. 2, we allow the user to 
supply a list of features as an optional input. In feature generation, getFeatures 
enumerates expressions made up by program variables and user-supplied features 
according to a grammar. Also, when incorporating counterexamples cex, we 
make 30 copies of each counterexample to give them more weights in the training. 
All experiments were conducted on a MacBook Pro 2020 with M1 chip running 
macOS Monterey Version 12.1. 


6.1 R1: Evaluation of the Exact Invariant Method 


Efficacy of Invariant Inference. EXIST was able to infer provably correct invari- 
ants in 14/18 benchmarks. Out of 14 successful benchmarks, only 2 of them 
need user-supplied features (n - p for Bin2 and Sum0). Table 1 shows the post- 
expectation (postE), the inferred invariant (Learned Invariant), sampling time 
(ST), learning time (LT), verification time (VT) and the total time (TT) for a 
few benchmarks. For generating exact invariants, the running time of EXIST is 
dominated by the sampling time. However, this phase can be parallelized easily. 


Failure Analysis. EXIST failed to generate invariants for 4/18 benchmarks. For 
two of them, EXIST was able to generate expectations that are very close to 


Data-Driven Invariant Learning for Probabilistic Programs 49 


an invariant (DepRV and LinExp); for the third failing benchmarks (Duel), the 
ground truth invariant is very complicated. For LinExp, while a correct invariant 
is z+ [n > 0] - 2.625- n, EXIST generates expectations like z + [n > 0] - (2.63 - 
n — 0.02) as candidates. For DepRV, a correct invariant is æ- y + [n > 0] - (0.25- 
n?+05-n-2+05-n-y—0.25-n), and in our experiment EXIST generates 
0.25-n?+0.5-n-2+0.5-n-y—0.27-n—0.01-x+0.12. In both cases, the 
ground truth invariants use coefficients with several digits, and since learning 
from data is inherently stochastic, EXIST cannot generate them consistently. In 
our experiments, we observe that our CEGIS loop does guide the learner to 
move closer to the correct invariant in general, but sometimes progress obtained 
in multiple iterations can be offset by noise in one iteration. For GeoAr, we 
observe the verifier incorrectly accepted the complicated candidate invariants 
generated by the learner because Wolfram Alpha was not able to find valid 
counterexamples for our queries. 


Comparison with Previous Work. There are few existing tools that can auto- 
matically compute expected values after probabilistic loops. We experimented 
with one such tool, called Mora [7]. (See high-level comparison in Sect.7.) We 
managed to encode our benchmarks Geo0, Bin0, Bin2, Geol, GeoAr, and Mart in 
their syntax. Among them, Mora fails to infer an invariant for Geol, GeoAr, and 
Mart. We also tried to encode our benchmarks Fair, Gambler, Bin1, and RevBin 
but found MORA’s syntax was too restrictive to encode them. 


6.2 R2: Evaluation of the Sub-invariant Method 


Efficacy of Invariant Inference. EXIST is able to synthesize sub-invariants for 
27/34 benchmarks. As before, Table 2 reports the results for a few benchmarks. 
Two out of 27 successful benchmarks use user-supplied features — Gambler with 
pre-expectation z- (y— x) uses (y — x), and Sum0 with pre-expectation «x + [x > 
0] -(p-n/2) uses p-n. Contrary to the case for exact invariants, the learning time 
dominates. This is not surprising: the sampling time is shorter because we only 
run one iteration of the loop, but the learning time is longer as we are optimizing 
a more complicated loss function. 

One interesting thing that we found when gathering benchmarks is that 
for many loops, pre-expectations used by prior work or natural choices of pre- 
expectations are themselves sub-invariants. Thus, for some instances, the sub- 
invariants generated by EXIST is the same as the pre-expectation preE given to 
it as input. However, EXIST is not checking whether the given preE is a sub- 
invariant: the learner in EXIST does not know about preE besides the value of 
preE evaluated on program states. Also, we also designed benchmarks where 
pre-expectations are not sub-invariants (BiasDir with preE = |x 4 y] - x, DepRV 
with preE = x- y + [n > 0]-1/4-n?, Gambler with preE = x - (y — x), Geo0 with 
preE = [flip == 0] - (1 — pl)), and EXIST is able to generate sub-invariants for 
3/4 such benchmarks. 
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Failure Analysis. On program instances where EXIST fails to generate a sub- 
invariant, we observe two common causes. First, gradient descent seems to get 
stuck in local minima because the learner returns suboptimal models with rel- 
atively low loss. The loss we are training on is very complicated and likely to 
be highly non-convex, so this is not surprising. Second, we observed inconsistent 
behavior due to noise in data collection and learning. For instance, for GeoAr 
with preE = z + |z 4 0]-y-(1—p)/p, EXIST could sometimes find a sub-invariant 
with supplied feature (1 — p), but we could not achieve this result consistently. 


Comparison with Learning Exact Invariants. The performance of EXIST on 
learning sub-invariants is less sensitive to the complexity of the ground truth 
invariants. For example, EXIST is not able to generate an exact invariant for 
LinExp as its exact invariant is complicated, but EXIST is able to generate 
sub-invariants for LinExp. However, we also observe that when learning sub- 
invariants, EXIST returns complicated expectations with high loss more often. 


7 Related Work 


Invariant Generation for Probabilistic Programs. There has been a steady line of 
work on probabilistic invariant generation over the last few years. The PRINSYS 
system [21] employs a template-based approach to guide the search for proba- 
bilistic invariants. PRINSYS is able encode invariants with guard expressions, but 
the system doesn’t produce invariants directly—instead, PRINSYS produces log- 
ical formulas encoding the invariant conditions, which must be solved manually. 

Chen et al. [14] proposed a counterexample-guided approach to find polyno- 
mial invariants, by applying Lagrange interpolation. Unlike PRINSys, this app- 
roach doesn’t need templates; however, invariants involving guard expressions— 
common in our examples—cannot be found, since they are not polynomials. 
Additionally, Chen et al. [14] uses a weaker notion of invariant, which only 
needs to be correct on certain initial states; our tool generates invariants that 
are correct on all initial states. Feng et al. [18] improves on Chen et al. [14] by 
using Stengle’s Positivstellensatz to encode invariants constraints as a semidef- 
inite programming problem. Their method can find polynomial sub-invariants 
that are correct on all initial states. However, their approach cannot synthesize 
piecewise linear invariants, and their implementation has additional limitations 
and could not be run on our benchmarks. 

There is also a line of work on abstract interpretation for analyzing probabilis- 
tic programs; Chakarov and Sankaranarayanan [11] search for linear expectation 
invariants using a “pre-expectation closed cone domain”, while recent work by 
Wang et al. [40] employs a sophisticated algebraic program analysis approach. 

Another line of work applies martingales to derive insights of probabilistic 
programs. Chakarov and Sankaranarayanan [10] showed several applications of 
martingales in program analysis, and Barthe et al. [5] gave a procedure to gen- 
erate candidate martingales for a probabilistic program; however, this tool gives 
no control over which expected value is analyzed—the user can only guess initial 
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expressions and the tool generates valid bounds, which may not be interesting. 
Our tool allows the user to pick which expected value they want to bound. 

Another line of work for automated reasoning uses moment-based analysis. 
Bartocci et al. [6,7] develop the MORA tool, which can find the moments of vari- 
ables as functions of the iteration for loops that run forever by using ideas from 
computational algebraic geometry and dynamical systems. This method is highly 
efficient and is guaranteed to compute moments exactly. However, there are two 
limitations. First, the moments can give useful insights about the distribution 
of variables’ values after each iteration, but they are fundamentally different 
from our notion of invariants which allow us to compute the expected value of 
any given expression after termination of a loop. Second, there are important 
restrictions on the probabilistic programs. For instance, conditional statements 
are not allowed and the use of symbolic inputs is limited. As a result, most of 
our benchmarks cannot be handled by MORA. 

In a similar vein, Kura et al. [27,39] bound higher central moments for run- 
ning time and other monotonically increasing quantities. Like our work, these 
works consider probabilistic loops that terminate. However, unlike our work, 
they are limited to programs with constant size increments. 


Data-Driven Invariant Synthesis. We are not aware of other data-driven meth- 
ods for learning probabilistic invariants, but a recent work Abate et al. [1] proves 
probabilistic termination by learning ranking supermartingales from trace data. 
Our method for learning sub-invariants (Sect.5) can be seen as a natural gener- 
alization of their approach. However, there are also important differences. First, 
we are able to learn general sub-invariants, not just ranking supermatingales for 
proving termination. Second, our approach aims to learn model trees, which lead 
to simpler and more interpretable sub-invariants. In contrast, Abate, et al. [1] 
learn ranking functions encoded as two-layer neural networks. 

Data-driven inference of invariants for deterministic programs has drawn a 
lot of attention, starting from DAIKON [17]. ICE learning with decision trees [20] 
modifies the decision tree learning algorithm to capture implication counterex- 
amples to handle inductiveness. HANOI [32] uses counterexample-based induc- 
tive synthesis (CEGIS) [38] to build a data-driven invariant inference engine 
that alternates between weakening and strengthening candidates for synthesis. 
Recent work uses neural networks to learn invariants [36]. These systems per- 
form classification, while our work uses regression. Data from fuzzing has been 
used for almost correct inductive invariants [29] for programs with closed-box 
operations. 


Probabilistic Reasoning with Pre-expectations. Following Morgan and Mclver, 
there are now pre-expectation calculi for domain-specific properties, like 
expected runtime [23] and probabilistic sensitivity [2]. All of these systems define 
the pre-expectation for loops as a least fixed-point, and practical reasoning about 
loops requires finding an invariant of some kind. 
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Abstract. We consider the quantitative problem of obtaining lower- 
bounds on the probability of termination of a given non-deterministic 
probabilistic program. Specifically, given a non-termination threshold 
p € [0,1], we aim for certificates proving that the program terminates 
with probability at least 1— p. The basic idea of our approach is to find a 
terminating stochastic invariant, i.e. a subset SJ of program states such 
that (i) the probability of the program ever leaving SI is no more than 
p, and (ii) almost-surely, the program either leaves SJ or terminates. 

While stochastic invariants are already well-known, we provide the 
first proof that the idea above is not only sound, but also complete for 
quantitative termination analysis. We then introduce a novel sound and 
complete characterization of stochastic invariants that enables template- 
based approaches for easy synthesis of quantitative termination certifi- 
cates, especially in affine or polynomial forms. Finally, by combining this 
idea with the existing martingale-based methods that are relatively com- 
plete for qualitative termination analysis, we obtain the first automated, 
sound, and relatively complete algorithm for quantitative termination 
analysis. Notably, our completeness guarantees for quantitative termina- 
tion analysis are as strong as the best-known methods for the qualitative 
variant. 

Our prototype implementation demonstrates the effectiveness of our 
approach on various probabilistic programs. We also demonstrate that 
our algorithm certifies lower bounds on termination probability for prob- 
abilistic programs that are beyond the reach of previous methods. 


1 Introduction 


Probabilistic Programs. Probabilistic programs extend classical imperative 
programs with randomization. They provide an expressive framework for specify- 
ing probabilistic models and have been used in machine learning [22,39], network 
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analysis [20], robotics [41] and security [4]. Recent years have seen the develop- 
ment of many probabilistic programming languages such as Church [23] and 
Pyro [6], and their formal analysis is an active topic of research. Probabilistic 
programs are often extended with non-determinism to allow for either unknown 
user inputs and interactions with environment or abstraction of parts that are 
too complex for formal analysis [31]. 


Termination. Termination has attracted the most attention in the literature on 
formal analysis of probabilistic programs. In non-probabilistic programs, it is a 
purely qualitative property. In probabilistic programs, it has various extensions: 


1. Qualitative: The almost-sure (a.s.) termination problem asks if the program 
terminates with probability 1, whereas the finite termination problems asks 
if the expected number of steps until termination is finite. 

2. Quantitative: The quantitative probabilistic termination problem asks for a 
tight lower bound on the termination probability. More specifically, given a 
constant p € [0, 1], it asks whether the program will terminate with probabil- 
ity at least 1 — p over all possible resolutions of non-determinism. 


Previous Qualitative Works. There are many approaches to prove a.s. termi- 
nation based on weakest pre-expectation calculus [27,31,37], abstract interpre- 
tation [34], type systems [5] and martingales [7,9, 11, 14, 25, 26,32,35]. This work 
is closest in spirit to martingale-based approaches. The central concept in these 
approaches is that of a ranking supermartingale (RSM) [T], which is a probabilis- 
tic extension of ranking functions. RSMs are a sound and complete proof rule 
for finite termination [21], which is a stricter notion than a.s. termination. The 
work of [32] proposed a variant of RSMs that can prove a.s. termination even 
for programs whose expected runtime is infinite, and lexicographic RSMs were 
studied in [1,13]. A main advantage of martingale-based approaches is that they 
can be fully automated for programs with affine/polynomial arithmetic [9,11]. 


Previous Quantitative Works. Quantitative analyses of probabilistic pro- 
grams are often more challenging. There are only a few works that study 
the quantitative termination problem: [5,14,40]. The works [14,40] propose 
martingale-based proof rules for computing lower-bounds on termination proba- 
bility, while [5] considers functional probabilistic programs and proposes a type 
system that allows incrementally searching for type derivations to accumulate a 
lower-bound on termination probability. See Sect.8 for a detailed comparison. 


Lack of Completeness. While [5,14,40] all propose sound methods to com- 
pute lower-bounds on termination probability, none of them are theoretically 
complete nor do their algorithms provide relative completeness guarantees. This 
naturally leaves open whether one can define a complete certificate for proving 
termination with probability at least 1 — p € [0,1], i.e. a certificate that a prob- 
abilistic program admits if and only if it terminates with probability at least 
1 — p, which allows for automated synthesis. Ideally, such a certificate should 
also be synthesized automatically by an algorithm with relative completeness 
guarantees, i.e. an algorithm which is guaranteed to compute such a certificate 
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for a sufficiently general subclass of programs. Note, since the problem of decid- 
ing whether a probabilistic program terminates with probability at least 1 — p is 
undecidable, one cannot hope for a general complete algorithm so the best one 
can hope for is relative completeness. 


Our Approach. We present the first method for the probabilistic termination 
problem that is complete. Our approach builds on that of [14] and uses stochastic 
invariants in combination with a.s. reachability certificates in order to compute 
lower-bounds on the termination probability. A stochastic invariant [14] is a 
tuple (SI, p) consisting of a set SI of program states and an upper-bound p on 
the probability of a random program run ever leaving SI. If one computes a 
stochastic invariant (ST, p) with the additional property that a random program 
run would, with probability 1, either terminate or leave SJ, then since SI is 
left with probability at most p the program must terminate with probability at 
least 1 — p. Hence, the combination of stochastic invariants and a.s. reachability 
certificates provides a sound approach to the probabilistic termination problem. 

While this idea was originally proposed in [14], our method for computing 
stochastic invariants is fundamentally different and leads to completeness. In [14], 
a stochastic invariant is computed indirectly by computing the set SI together 
with a repulsing supermartingale (RepSM), which can then be used to compute 
a probability threshold p for which (SJ, p) is a stochastic invariant. It was shown 
in [40, Section 3] that RepSMs are incomplete for computing stochastic invari- 
ants. Moreover, even if a RepSM exists, the resulting probability bound need not 
be tight and the method of [14] does not allow optimizing the computed bound 
or guiding computation towards a bound that exceeds some specified probability 
threshold. 

In this work, we propose a novel and orthogonal approach that computes 
the stochastic invariant and the a.s. termination certificate at the same time 
and is provably complete for certifying a specified lower bound on termina- 
tion probability. First, we show that stochastic invariants can be characterized 
through the novel notion of stochastic invariant indicators (SI-indicators). The 
characterization is both sound and complete. Furthermore, it allows fully auto- 
mated computation of stochastic invariants for programs using affine or poly- 
nomial arithmetic via a template-based approach that reduces quantitative ter- 
mination analysis to constraint solving. Second, we prove that stochastic invari- 
ants together with an a.s. reachability certificate, when synthesized in tandem, 
are not only sound for probabilistic termination, but also complete. Finally, we 
present the first relatively complete algorithm for probabilistic termination. Our 
algorithm considers polynomial probabilistic programs and simultaneously com- 
putes a stochastic invariant and an a.s. reachability certificate in the form of an 
RSM using a template-based approach. Our algorithmic approach is relatively 
complete. 

While we focus on the probabilistic termination problem in which the goal is 
to verify a given lower bound 1 — p on the termination probability, we note that 
our method may be straightforwardly adapted to compute a lower bound on the 
termination probability. In particular, we may perform a binary-search on p and 
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search for the smallest value of p for which 1 — p can be verified to be a lower 
bound on the termination probability. 


Contributions. Our specific contributions in this work are as follows: 


1. We present a sound and complete characterization of stochastic invariants 
through the novel notion of stochastic invariant indicators (Sect. 4). 

2. We prove that stochastic invariants together with an a.s. reachability certifi- 
cate are sound and complete for proving that a probabilistic program termi- 
nates with at least a given probability threshold (Sect. 5). 

3. We present a relatively complete algorithm for computing SI-indicators, and 
hence stochastic invariants over programs with affine or polynomial arith- 
metic. By combining it with the existing relatively complete algorithms for 
RSM computation, we obtain the first algorithm for probabilistic termination 
that provides completeness guarantees (Sect. 6). 

4. We implement a prototype of our approach and demonstrate its effectiveness 
over various benchmarks (Sect. 7). We also show that our approach can handle 
programs that were beyond the reach of previous methods. 


2 Overview 


Before presenting general theorems and algorithms, we first illustrate our method 
on the probabilistic program in Fig.1. The program models a 1-dimensional 
discrete-time random walk over the real line that starts at x = 0 and terminates 
once a point with x < 0 is reached. In every time step, x is incremented by a 
random value sampled according to the uniform distribution Uniform(|—1,0.5]). 
However, if the stochastic process is in a point with x > 100, then the value 
of x might also be incremented by a random value independently sampled from 
Uniform([—1,2]). The choice on whether the second increment happens is non- 
deterministic. By a standard random walk argument, the program does not ter- 
minate almost-surely. 


Outline of Our Method. Let p = 0.01. To prove this program terminates 
with probability at least 1 — p = 0.99, our method computes the following two 
objects: 


1. Stochastic invariant. A stochastic invariant is a tuple (SI, p) s.t. SI is a set of 
program states that a random program run leaves with probability at most 
p. 

2. Termination proof for the stochastic invariant. A ranking supermartingale 
(RSM) [7] is computed in order to prove that the program will, with proba- 
bility 1, either terminate or leave the set SI. Since SI is left with probability 
at most p, the program must terminate with probability at least 1 — p. 
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xr=0 
Limit : while z > 0 do 
ey: rı := Uniform([—1, 0.5]) 
bo: ewi=a2t+r) 
bs: if x > 100 then 
ba: if x then 
bs: r2 := Uniform((—1, 2]) 
Le: Li=x24+17, 


Fig. 1. Our running example. 


Synthesizing SI. To find a stochastic invariant, our method computes a state 
function f which assigns a non-negative real value to each reachable program 
state. We call this function a stochastic invariant indicator (SI-indicator), and it 
serves the following two purposes: First, exactly those states which are assigned 
a value strictly less than 1 are considered a part of the stochastic invariant SI. 
Second, the value assigned to each state is an upper-bound on the probability 
of leaving SI if the program starts from that state. Finally, by requiring that 
the value of the SI-indicator at the initial state of the program is at most p, we 
ensure a random program run leaves the stochastic invariant with probability at 
most p. 

In Sect. 4, we will define SI-indicators in terms of conditions that ensure the 
properties above and facilitate automated computation. We also show that SI- 
indicators serve as a sound and complete characterization of stochastic invari- 
ants, which is one of the core contributions of this work. The significance of 
completeness of the characterization is that, in order to search for a stochas- 
tic invariant with a given probability threshold p, one may equivalently search 
for an SI-indicator with the same probability threshold whose computation can 
be automated. As we will discuss in Sect.8, previous approaches to the synthe- 
sis of stochastic invariants were neither complete nor provided tight probability 
bounds. For Fig. 1, we have the following set SJ which will be left with proba- 
bility at most p = 0.01: 


ST(0) o (x < 99) if le {linit, b1, €2, €3, Lout} 
| false otherwise. 


(1) 


An SI-indicator for this stochastic invariant is: 


T if L € {linit, £1, L3, Lout} and x < 99 
f(z, rir) = 4 Se if 0 = b and x < 99 (2) 
1 otherwise. 


It is easy to check that (SJ,0.01) is a stochastic invariant and that for every 
state s = (€,2,71,172), the value f(s) is an upper-bound on the probability of 
eventually leaving SI if program execution starts at s. Also, s E€ SI = f(s) <1. 
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Synthesizing a Termination Proof. To prove that a probabilistic program 
terminates with probability at least 1 — p, our method searches for a stochastic 
invariant (SI, p) for which, additionally, a random program run with probability 
1 either leaves SI or terminates. This idea is formalized in Theorem 2, which 
shows that stochastic invariants provide a sound and complete certificate for 
proving that a given probabilistic program terminates with probability at least 
1 — p. In order to impose this additional condition, our method simultaneously 
computes an RSM for the set of states ~ST U State perm, where Stateterm is the 
set of all terminal states. RSMs are a classical certificate for proving almost-sure 
termination or reachability in probabilistic programs. A state function 7 is said 
to be an RSM for 7ST U Statezerm if it satisfies the following two conditions: 


— Non-negativity. n(£,2,171,72) > 0 for any reachable state (¢,2,171,1r2) € ST; 

— e-decrease in expectation. There exists € > 0 such that, for any reachable 
non-terminal state (@,2,1r1,72) € SI, the value of 7 decreases in expectation 
by at least £ after a one-step execution of the program from (4, 2,171,192). 


The existence of an RSM for =STU Stateterm implies that the program will, with 
probability 1, either terminate or leave SI. As (SI,p) is a stochastic invariant, 
we can readily conclude that the program terminates with probability at least 
1—p=0.99. An example RSM with € = 0.05 for our example above is: 


x+1.1 if L = linit 
x+ 1.05 if l= 4 
x+1.2+rı if l= by 
A yras = 3 
mE @rT= ciis if L= b (3) 
at+l if 2 = lout 
100 otherwise. 


Simultaneous Synthesis. Our method employs a template-based approach 
and synthesizes the SI and the RSM simultaneously. We assume that our method 
is provided with an affine/polynomial invariant J which over-approximates the 
set of all reachable states in the program, which is necessary since the defining 
conditions of SI-indicators and RSMs are required to hold at all reachable pro- 
gram states. Note that invariant generation is an orthogonal and well-studied 
problem and can be automated using [10]. For both the SI-indicator and the 
RSM, our method first fixes a symbolic template affine/polynomial expression 
for each location in the program. Then, all the defining conditions of SI-indicators 
and RSMs are encoded as a system of constraints over the symbolic template 
variables, where reachability of program states is encoded using the invariant J, 
and the synthesis proceeds by solving this system of constraints. We describe 
our algorithm in Sect. 6, and show that it is relatively complete with respect to 
the provided invariant J and the probability threshold 1 — p. On the other hand, 
we note that our algorithm can also be adapted to compute lower bounds on the 
termination probability by combining it with a binary search on p. 
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Completeness vs Relative Completeness. Our characterization of stochas- 
tic invariants using indicator functions is complete. So is our reduction from 
quantitative termination analysis to the problem of synthesizing an SJ-indicator 
function and a certificate for almost-sure reachability. These are our core theoret- 
ical contributions in this work. Nevertheless, as mentioned above, RSMs are com- 
plete only for finite termination, not a.s. termination. Moreover, template-based 
approaches lead to completeness guarantees only for solutions that match the 
template, e.g. polynomial termination certificates of a bounded degree. There- 
fore, our end-to-end approach is only relatively complete. These losses of com- 
pleteness are due to Rice’s undecidability theorem and inevitable even in qual- 
itative termination analysis. In this work, we successfully provide approaches 
for quantitative termination analysis that are as complete as the best known 
methods for the qualitative case. 


3 Preliminaries 


We consider imperative arithmetic probabilistic programs with non-determinism. 
Our programs allow standard programming constructs such as conditional 
branching, while-loops and variable assignments. They also allow two proba- 
bilistic constructs — probabilistic branching which is indicated in the syntax by 
a command ‘if prob(p) then ...’ with p € [0,1] a real constant, and sampling 
instructions of the form æ := d where d is a probability distribution. Sampling 
instructions may contain both discrete (e.g. Bernoulli, geometric or Poisson) and 
continuous (e.g. uniform, normal or exponential) distributions. We also allow 
constructs for (demonic) non-determinism. We have non-deterministic branch- 
ing which is indicated in the syntax by ‘ifxthen...’, and non-deterministic 
assignments represented by an instruction of the form x := ndet([a, b]), where 
a,b € RU {+00} and [a,b] is a (possibly unbounded) real interval from which 
the new variable value is chosen non-deterministically. We also allow one or 
both sides of the interval to be open. The complete syntax of our programs is 
presented in [12, Appendix A]. 


Notation. We use boldface symbols to denote vectors. For a vector x of dimen- 
sion n and 1 <i < n, x[2] denotes the i-th component of x. We write x[i — a] 
to denote an n-dimensional vector y with yļi] = a and y[j] = xy] for j #7. 


Program Variables. Variables in our programs are real-valued. Given a finite 
set of variables V, a variable valuation of V is a vector x € RIV, 


Probabilistic Control-Flow Graphs (pCFGs). We model our programs via 
probabilistic control-flow graphs (pCFGs) [11,14]. A probabilistic control-flow 
graph (pCFG) isa tuple C= (L, V, linits Kiriti G, Pr, Up), where: 


— Lisa finite set of locations, partitioned into locations of conditional branching 
Lo, probabilistic branching Lp, non-det branching Ly and assignment L4. 

- V = {z1,..., £y] } is a finite set of program variables; 

— Linit is the initial program location, 
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— Xinit € RIV! is the initial variable valuation; 

— >C Lx Lisa finite set of transitions. For each transition T = (4, £’), we say 
that £ is its source location and ¢’ its target location; 

— Gis a map assigning to each transition T = (£, 0’) €> with £ € Lc a guard 
G(r), which is a logical formula over V specifying whether 7 can be executed; 

— Pr is a map assigning to each transition T = (£,f/) €> with £ € Lpa 
probability Pr(r) € [0,1]. We require }7,_(¢_) Pr(r) = 1 for each ¢ € Lp; 

— Up is a map assigning to each transition 7 = (0,0) CH with £ € L4 an 
update Up(r) = (j,u) where j € {1,...,|V|} is a target variable index and u 
is an update element which can be: 

e the bottom element u = L, denoting no update; 

e a Borel-measurable expression u : RIV! — R, denoting a deterministic 
variable assignment; 

e a probability distribution u = d, denoting that the new variable value is 
sampled according to d; 

e an interval u = [a,b] C RU {too}, denoting a non-deterministic update. 
We also allow one or both sides of the interval to be open. 


We assume the existence of the special terminal location denoted by Lout. We 
also require that each location has at least one outgoing transition, and that each 
£ € La has a unique outgoing transition. For each location £ € Lc, we assume 
that the disjunction of guards of all transitions outgoing from £ is equivalent to 
true, i.e. Vet ) G(r) = true. Translation of probabilistic programs to pCFGs 
that model them is standard, so we omit the details and refer the reader to [11]. 
The pCFG for the program in Fig. 1 is provided in [12, Appendix B]. 


States, Paths and Runs. A state in a pCFG C is a tuple (¢,x), where £ isa 
location in C and x € RIV! is a variable valuation of V. We say that a transition 
T = (£, l) is enabled at a state (¢,x) if l g Lo or if l € Lc and x — G(r). We say 
that a state (’,x’) is a successor of (¢,x), if there exists an enabled transition 
T = (£L, l) in C such that (’,x’) can be reached from (¢,x) by executing 7, i.e. 
we can obtain x’ by applying the updates of 7 to x, if any. A finite path in C 
is a sequence (lo, Xo), (€1,X1),---,(€k, Xk) of states with (20, x0) = (Linit, Xinit) 
and with (¢;41,x:+1) being a successor of (¢;,x;) for each 0 < i < k—1. A state 
(€,x) is reachable in C if there exists a finite path in C that ends in (¢,x). A 
run (or execution) in C is an infinite sequence of states where each finite prefix 
is a finite path. We use Statec, Fpathe, Runc, Reache to denote the set of all 
states, finite paths, runs and reachable states in C, respectively. Finally, we use 
State term to denote the set {(€ou:,x) | x € RIVI} of terminal states. 


Schedulers. The behavior of a pCFG may be captured by defining a probabil- 
ity space over the set of all runs in the pCFG. For this to be done, however, we 
need to resolve non-determinism and this is achieved via the standard notion 
of a scheduler. A scheduler in a pCFG C is a map o which to each finite path 
p E€ Fpathe assigns a probability distribution o(p) over successor states of the 
last state in p. Since we deal with programs operating over real-valued vari- 
ables, the set Fpathe may be uncountable. To that end, we impose an additional 
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measurability assumption on schedulers, in order to ensure that the semantics 
of probabilistc programs with non-determinism is defined in a mathematically 
sound way. The restriction to measurable schedulers is standard. Hence, we omit 
the formal definition. 


Semantics of pCFGs. A pCFG C with a scheduler o define a stochastic pro- 
cess taking values in the set of states of C, whose trajectories correspond to 
runs in C. The process starts in the initial state (Linit, Xinit) and inductively 
extends the run, where the next state along the run is chosen either determin- 
istically or is sampled from the probability distribution defined by the current 
location along the run and by the scheduler ø. These are the classical opera- 
tional semantics of Markov decision processes (MDPs), see e.g. [1,27]. A pCFG 
C and a scheduler o together determine a probability space (Rune, Fe, P7) over 
the set of all runs in C. For details, see [12, Appendix C]. We denote by E” the 
expectation operator on (Runc, Fc, P”). We may analogously define a probabil- 
ity space (Runcu,x), Feux) Pee x)) over the set of all runs in C that start in 


some specified state (£, x). 


Probabilistic Termination Problem. We now define the termination problem 
for probabilistic programs considered in this work. A state (¢,x) in a pCFG 
C is said to be a terminal state if l = lout. A run p € Rung is said to be 
terminating if it reaches some terminal state in C. We use Term C Rune to 
denote the set of all terminating runs in Runc. The termination probability of 
a pCFG C is defined as inf, P?[Term], i.e. the smallest probability of the set 
of terminating runs in C with respect to any scheduler in C (for the proof that 
Term is measurable, see [40]). We say that C terminates almost-surely (a.s.) if its 
termination probability is 1. In this work, we consider the Lower Bound on the 
Probability of Termination (LBPT) problem that, given p € [0,1], asks whether 
1 — p is a lower bound for the termination probability of the given probabilistic 
program, i.e. whether inf, P?[Term] > 1 — p. 


4 A Sound and Complete Characterization of SIs 


In this section, we recall the notion of stochastic invariants and present our 
characterization of stochastic invariants through stochastic indicator functions. 
We fix a pCFG C = (L, V, linit, Xinit, œ, G, Pr, Up). A predicate function in C isa 
map F that to every location ¢ € L assigns a logical formula F'(¢) over program 
variables. It naturally induces a set of states, which we require to be Borel- 
measurable for the semantics to be well-defined. By a slight abuse of notation, 
we identify a predicate function F with this set of states. Furthermore, we use 
=F to denote the negation of a predicate function, i.e. ({F)( = 7F(¢). An 
invariant in C is a predicate function I which additionally over-approximates 
the set of reachable states in C, i.e. for every (¢,x) € Reachc we have x — I(¢). 
Stochastic invariants can be viewed as a probabilistic extension of invariants, 
which a random program run leaves only with a certain probability. See Sect. 2 
for an example. 
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Definition 1 (Stochastic invariant [14]). Let SI a predicate function in C 
and p € [0,1] a probability. The tuple (SI,p) is a stochastic invariant (SI) if the 
probability of a run in C leaving the set of states defined by SI is at most p under 
any scheduler. Formally, we require that 


sup, P7 lp € Runc | p reaches some (£,x) with x |K ST(£)| < p. 


Key Challenge. If we find a stochastic invariant (S7,p) for which termination 
happens almost-surely on runs that do not leave ST, we can immediately conclude 
that the program terminates with probability at least 1—p (this idea is formalized 
in Sect.5). The key challenge in designing an efficient termination analysis based 
on this idea is the computation of appropriate stochastic invariants. We present 
a sound and complete characterization of stochastic invariants which allows for 
their effective automated synthesis through template-based methods. 

We characterize stochastic invariants through the novel notion of stochastic 
invariant indicators (SI-indicators). An Sl-indicator is a function that to each 
state assigns an upper-bound on the probability of violating the stochastic invari- 
ant if we start the program in that state. Since the definition of an SI-indicator 
imposes conditions on its value at reachable states and since computing the 
exact set of reachable states is in general infeasible, we define SI-indicators with 
respect to a supporting invariant with the later automation in mind. In order 
to understand the ideas of this section, one may assume for simplicity that the 
invariant exactly equals the set of reachable states. A state-function in C is a 
function f that to each location £ € L assigns a Borel-measurable real-valued 
function over program variables f(@) : RIV! — R. We use f(@,x) and f(é)(x) 
interchangeably. 


Definition 2 (Stochastic invariant indicator). A tuple (fsr,p) comprising 
a state function fgr and probability p € [0,1] is a stochastic invariant indicator 
(SI-indicator) with respect to an invariant I, if it satisfies the following conditions: 


(Cı) Non-negativity. For every location £L € L, we have x = I(t) => 
fsı(£, x) > 0. 
2 on-increasing expected value. For every location £ € L, we have: 
C2) N d val l £ h 
(C3) If £€ Le, then for any transition T = (4, l) we have x — I(£) ^A G(T) > 
fsi(6X) 2 fs1(l,x). 
(C3) If lE Lp, then x H I(ġ > fsr(€,x) > X,- enen PICT): fsr(€’,x). 
(C3) Ifl e Ly, then x H I(l) > fr(€,x) > max,=(eeyjeu fsr(l, x). 
(C3) Ifl e La with T= (L, l) the unique outgoing transition from £, then: 
a If Up(T )= (j, L), x = IO > f(x) = f(x). 
(T 


- If Up(r) = (j,u) with u : RIY! —> R an expression, we have x |= 
IO => f(x) > fW, xlej — ulxi)]). 
- If Up(r) = (j,u) with u = d a distribution, we have x H I(é) > 
FEX) 2 Exnalf(@, x[xj — XJ]. 
- If Up(r) = (j,u) with u = [a,b] an interval, we have x = I(t) > 
F(4x) 2 supxefa nt fl, xl; — XD- 


(C3) Initial condition. We ane F (init, Xinit) < p: 
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Intuition. (C1) imposes that f is nonnegative at any state contained in the 
invariant J. Next, for any state in J, (C2) imposes that the value of f does 
not increase in expectation upon a one-step execution of the pCFG under any 
scheduler. Finally, the condition (C3) imposes that the initial value of f in C is 
at most p. Together, the indicator thus intuitively over-approximates the proba- 
bility of violating SI. An example of an SI-indicator for our running example in 
Fig. 1 is given in (2). The following theorem formalizes the above intuition and 
is our main result of this section. In essence, we prove that (SI, p) is a stochastic 
invariant in C iff there exists an SI-indicator (fg7,p) such that SI contains all 
states at which fs; is strictly smaller than 1. This implies that, for every stochas- 
tic invariant (ST, p), there exists an SI-indicator such that (SJ’,p) defined via 
SI'(@) = (x H (O A fsr(€,x) < 1) is a stochastic invariant that is at least as 
tight as (SI, p). 


Theorem 1 (Soundness and Completeness of SI-indicators). Let C be 
a pCFG, I an invariant in C and p € [0,1]. For any SI-indicator (fsz, p) with 
respect to I, the predicate map SI defined as SI(¢) = (x = I(€) A fr(£,x) < 
1) yields a stochastic invariant (SI,p) in C. Conversely, for every stochastic 
invariant (SI, p) in C, there exist an invariant Isr and a state function fs; such 
that (fsr,p) is an SI-indicator with respect to Ig; and for each £ € L we have 
SIO) 2 (x F Isr(€) A fsr(é,x) < 1). 


Proof Sketch. Since the proof is technically involved, we present the main 
ideas here and defer the details to [12, Appendix E]. First, suppose that I is 
an invariant in C and that (fs7,p) is an Sl-indicator with respect to I, and 
let ST(2) = (x = I(€) A fsr(£,x) < 1) for each 4 € L. We need to show that 
(SI,p) is a stochastic invariant in C. Let sup, Pe x) [Reach(~SI)] be a state 


function that maps each state (¢,x) to the probability of reaching ~SI from 
(£, x). We consider a lattice of non-negative semi-analytic state-functions (£, E) 
with the partial order defined via f E f’ if f(€,x) < f’(€,x) holds for each 
state (¢,x) in I. See [12, Appendix D] for a review of lattice theory. It follows 
from a result in [40] that the probability of reaching “SI can be characterized 
as the least fixed point of the next-time operator Xs, : L —> L. Away from ST, 
the operator Xs; simulates a one-step execution of C and maps f € £ to its 
maximal expected value upon one-step execution of C where the maximum is 
taken over all schedulers, and at states contained in —S7 the operator X—g7 is 
equal to 1. It was also shown in [40] that, if a state function f € £ is a pre-fixed 
point of X_s,, then it satisfies sup, P/ x) [Reach(=SI)] < f(¢,x) for each (¢,x) in 
I. Now, by checking the defining properties of pre-fixed points and recalling that 
fsı satisfies Non-negativity condition (C1) and Non-increasing expected value 
condition (C2) in Definition 2, we can show that fsz is contained in the lattice £ 
and is a pre-fixed point of X.s7. It follows that sup, Pies xma) Reach(>ST)] < 
fist (Cinit, Xinit)- On the other hand, by initial condition (C3) in Definition 2 we 
know that fsr(@init, Xinit) < p. Hence, we have sup, Pe Reach(=ST)] < p 


init Kimit) | 
so (SI, p) is a stochastic invariant. 
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Conversely, suppose that (SI, p) is a stochastic invariant in C. We show in 
[12, Appendix FE] that, if we define Is; to be the trivial true invariant and 
define fs7(@,x) = sup, Pex) [Reach(~SI)], then (fsz, p) forms an SI-indicator 
with respect to Ig;. The claim follows by again using the fact that fs; is the 
least fixed point of the operator X~ sz, from which we can conclude that (fsz, p) 
satisfies conditions (C1) and (C2) in Definition 2. On the other hand, the fact 
that (SI, p) is a stochastic invariant and our choice of fg; imply that (fsz, p) 
satisfies the initial condition (C3) in Definition2. Hence, (fsz, p) forms an SI- 
indicator with respect to Isz. Furthermore, SI (£) D (x H Isr(ġ A fsr(£,x) < 1) 


o 


follows since 1 > fsr(¢,x) = sup, P(, ,)[Reach(—S7)] implies that (¢,x) cannot 


be contained in =ST so x — SI(£). This concludes the proof. 

Based on the theorem above, in order to compute a stochastic invariant in 
C for a given probability threshold p, it suffices to synthesize a state function 
fsı that together with p satisfies all the defining conditions in Definition 2 with 
respect to some supporting invariant J, and then consider a predicate function 
SI defined via SI(¢) = (x = I(@) A fsr(@,x) < 1) for each £ € L. This will be 
the guiding principle of our algorithmic approach in Sect. 6. 


Intuition on Characterization. Stochastic invariants can essentially be 
thought of as quantitative safety specifications in probabilistic programs — (ST, p) 
is a stochastic invariant if and only if a random probabilistic program run leaves 
SI with probability at most p. However, what makes their computation hard 
is that they do not consider probabilities of staying within a specified safe set. 
Rather, the computation of stochastic invariants requires computing both the 
safe set and the certificate that it is left with at most the given probability. 
Nevertheless, in order to reason about them, we may consider SJ as an implic- 
itly defined safe set. Hence, if we impose conditions on a state function fs; to 
be an upper bound on the reachability probability for the target set of states 
(x = I(L)A fsr(€,x) < 1), and in addition impose that fg7(linit, Xinit) < p, then 
these together will entail that p is an upper bound on the probability of ever 
leaving SJ when starting in the initial state. This is the intuitive idea behind our 
construction of SI-indicators, as well as our soundness and completeness proof. 
In the proof, we show that conditions (C1) and (C2) in Definition 2 indeed entail 
the necessary conditions to be an upper bound on the reachability probability 
of the set (x = I(4 A fsr(é,x) < 1). 


5 Stochastic Invariants for LBPT 


In the previous section, we paved the way for automated synthesis of stochas- 
tic invariants by providing a sound and complete characterization in terms of 
SLindicators. We now show how stochastic invariants in combination with any 
a.s. termination certificate for probabilistic programs can be used to compute 
lower-bounds on the probability of termination. Theorem 2 below states a gen- 
eral result about termination probabilities that is agnostic to the termination 
certificate, and shows that stochastic invaraints provide a sound and complete 
approach to quantitative termination analysis. 
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Theorem 2 (Soundness and Completeness of SIs for Quantitative Ter- 
mination). Let C = (L,V, linit,Xinit, >, G, Pr, Up) be a pCFG and (SI,p) a 
stochastic invariant in C. Suppose that, with respect to every scheduler, a run in 
C almost-surely either terminates or reaches a state in “SI, i.e. 


inf, P7 [Term U Reach(=SI)| = 1. (4) 


Then C terminates with probability at least 1 — p. Conversely, if C terminates 
with probability at least 1 — p, then there exists a stochastic invariant (SI, p) 
in C such that, with respect to every scheduler, a run in C almost-surely either 
terminates or reaches a state in ~S]. 


Proof Sketch. The first part (soundness) follows directly from the definition of 
SI and (4). The completeness proof is conceptually and technically involved and 
presented in |12, Appendix H]. In short, the central idea is to construct, for every 
n greater than a specific threshold no, a stochastic invariant (SIn, p + +) such 
that a run almost-surely either terminates or exists SIn. Then, we show that 
ante n is our desired SJ. To construct each SIn, we consider the infimum 
termination probability at every state (€,x) and call it r(@,x). The infimum is 
taken over all schedulers. We then let SJ, be the set of states (¢,x) for whom 
r(é,x) is greater than a specific threshold a. Intuitively, our stochastic invariant 
is the set of program states from which the probability of termination is at least 
a, no matter how the non-determinism is resolved. Let us call these states likely- 
terminating. The intuition is that a random run of the program will terminate 
or eventually leave the likely-terminating states with high probability. 


Quantitative to Qualitative Termination. Theorem 2 provides us with a 
recipe for computing lower bounds on the probability of termination once we 
are able to compute stochastic invariants: if (SI, p) is a stochastic invariant in 
a pCFG C, it suffices to prove that the set of states Stateterm USI is reached 
almost-surely with respect to any scheduler in C, i.e. the program terminates or 
violates SI. Note that this is simply a qualitative a.s. termination problem, except 
that the set of terminal states is now augmented with ST. Then, since (SJ, p) 
is a stochastic invariant, it would follow that a terminal state is reached with 
probability at least 1— p. Moreover, the theorem shows that this approach is both 
sound and complete. In other words, proving quantitative termination, i.e. that 
we reach Stateterm with probability at least 1 — p is now reduced to (i) finding 
a stochastic invariant (SI, p) and (ii) proving that the program C’ obtained by 
adding —SI to the set of terminal states of C is a.s. terminating. Note that, to 
preserve completeness, (i) and (ii) should be achieved in tandem, i.e. an approach 
that first synthesizes and fixes SI and then tries to prove a.s. termination for 
ASI is not complete. 


Ranking Supermartingales. While our reduction above is agnostic to the type 
of proof/certificate that is used to establish a.s. termination, in this work we use 
Ranking Supermartingales (RSMs) [7], which are a standard and classical cer- 
tificate for proving a.s. termination and reachability. Let C = (L, V, linit, Xinit, œ 
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,G, Pr, Up) be a pCFG and J an invariant in C. Note that as in Definition 2, 
the main purpose of the invariant is to allow for automated synthesis and one 
can again simply assume it to equal the set of reachable states. An -RSM for a 
subset T of states is a state function that is non-negative in each state in J, and 
whose expected value decreases by at least £ > 0 upon a one-step execution of C 
in any state that is not contained in the target set T. Thus, intuitively, a program 
run has an expected tendency to approach the target set T where the distance 
to T is given by the value of the RSM which is required to be non-negative in 
all states in J. The ¢-ranked expected value condition is formally captured via 
the next-time operator X (See [12, Appendix E]). An example of an RSM for 
our running example in Fig. 1 and the target set of states ~SI U State term with 
SI the stochastic invariant in Eq. (1) is given in Eq. (3). 


Definition 3 (Ranking supermartingales). Let T be a predicate function 
defining a set of target states in C, and let e > 0. A state function ņ is said to 
be an e-ranking supermartingale (e-RSM) for T with respect to the invariant I 
if it satisfies the following conditions: 


1. Non-negativity. For each location L € L and x € I(£), we have n(é,x) > 0. 
2. e-ranked expected value. For each location £ € L and x — I(£)N AT(£), we 
have n(é,x) > X(n)(é,x) +€. 


Note that the second condition can be expanded according to location types in 
the exact same manner as in condition C2 of Definition 2. The only difference is 
that in Definition 2, the expected value had to be non-increasing, whereas here 
it has to decrease by e€. It is well-known that the two conditions above entail 
that T is reached with probability 1 with respect to any scheduler [7,11]. 


Theorem 3. (Proof in |12, Appendix I|). Let C be a pCFG, I an invariant 
in C and T a predicate function defining a target set of states. If there exist 
€ > 0 and an e-RSM for T with respect to I, then T is a.s. reached under any 
scheduler, i.e. 


miee ee [ Reach(T)| =1. 


The following theorem is an immediate corollary of Theorems 2 and 3. 


Theorem 4. Let C be a pCFG and I be an invariant in C. Suppose that there 
exist a stochastic invariant (SI, p), an € > 0 and ane-RSM q for Stateterm Un SI 
with respect to I. Then C terminates with probability at least 1 — p. 


Therefore, in order to prove that C terminates with probability at least 1 — p, 
it suffices to find (i) a stochastic invariant (SI, p) in C, and (ii) an -RSM 7 
for Stateterm U SI with respect to I and some € > 0. Note that these two 
tasks are interdependent. We cannot simply choose any stochastic invariant. For 
instance, the trivial predicate function defined via SI = true always yields a 
valid stochastic invariant for any p € [0,1], but it does not help termination 
analysis. Instead, we need to compute a stochastic invariant and an RSM for it 
simultaneously. 
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Power of Completeness. We end this section by showing that our approach 
certifies a tight lower-bound on termination probability for a program that was 
proven in [40] not to admit any of the previously-existing certificates for lower 
bounds on termination probability. This shows that our completeness pays off 
in practice and our approach is able to handle programs that were beyond the 
reach of previous methods. Consider the program in Fig.2 annotated by an 
invariant J. We show that our approach certifies that this program terminates 
with probability at least 0.5. Indeed, consider a stochastic invariant (ST,0.5) 
with SI(£) = true if L # l3, and SI(¢3) = false, and a state function defined via 
n(linit, £) = —log(x) + log(2) + 3, n(¢1, x) = —log(x) + log(2) + 2, n(l2,£) = 1 
and (3,2) = n(Lout, £) = 0 for each x. Then one can easily check by inspection 
that (S7,0.5) is a stochastic invariant and that 7 is a (log(2) — 1)-RSM for 
State term U AST with respect to I. Therefore, it follows by Theorem 4 that the 
program in Fig. 2 terminates with probability at least 0.5. 


6 Automated Template-Based Synthesis Algorithm 


We now provide template-based relatively complete algorithms for simultaneous 
and automated synthesis of SI-indicators and RSMs, in order to solve the quanti- 
tative termination problem over pCFGs with affine/polynomial arithmetic. Our 
approach builds upon the ideas of [2,9] for qualitative and non-probabilistic 
cases. 


x = ndet((0, 1)) 


Lini: While x <1 do {0 <a < 2} 
fy: i= 2er {0<xr<1} 
fg: if prob(0.5) then {1<x<2} 
3: while true do skip od {l1<z< 2} 
Lout : {1 <a < 2} 


Fig. 2. A program that was shown in [40] not to admit a repulsing supermartingale [14] 
or a gamma-scaled supermartingale [40], but for which our method can certify the tight 
lower-bound of 0.5 on the probability of termination. 


Input and Assumptions. The input to our algorithms consists of a pCFG C 
together with a probability p € [0,1], an invariant J,* and technical variables 6 
and M, which specify polynomial template sizes used by the algorithm and which 
will be discussed later. In this section, we limit our focus to affine/polynomial 
pCFGs, i.e. we assume that all guards G(r) in C and all invariants I(£) are 
conjunctions of affine/polynomial inequalities over program variables. Similarly, 
we assume that every update function u : RIV! — R used in deterministic variable 
assignments is an affine/polynomial expression in R[V]. 


* We assume an invariant is given as part of the input. Invariant generation is an 
orthogonal and well-studied problem and can be automated using [10, 16]. 
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Output. The goal of our algorithms is to synthesize a tuple (f,7,¢) where f is 
an SI-indicator function, 7 is a corresponding RSM, and e > 0, such that: 


— At every location £ of C, both f(£) and n(@) are affine/polynomial expressions 
of fixed degree 6 over the program variables V. 

— Having ST(£) := {x | f(é,x) < 1}, the pair (SI,p) is a valid stochastic 
invariant and 7 is an e-RSM for Stateterm U AST with respect to T. 


As shown in Sects. 4 and 5, such a tuple w = (f,7,€) serves as a certificate that 
the probabilistic program modeled by C terminates with probability at least 
1 — p. We call w a quantitative termination certificate. 


Overview. Our algorithm is a standard template-based approach similar 
to [2,9]. We encode the requirements of Definitions2 and 3 as entailments 
between affine/polynomial inequalities with unknown coefficients and then apply 
the classical Farkas’ Lemma [17] or Putinar’s Positivstellensatz [38] to reduce the 
synthesis problem to Quadratic Programming (QP). Finally, we solve the result- 
ing QP using a numerical optimizer or an SMT-solver. Our approach consists of 
the four steps below. Step 3 follows [2] exactly. Hence, we refer to [2] for more 
details on this step. 


Step 1. Setting Up Templates. The algorithm sets up symbolic templates 
with unknown coefficients for f,7 and e. 


— First, for each location £ of C, the algorithm sets up a template for f (£) which 
is a polynomial consisting of all possible monomials of degree at most 6 over 
program variables, each appearing with an unknown coefficient. For example, 
consider the program in Fig.1 of Sect.2. This program has three variables: 
x,r, and ra. If ô = 1, i.e. if the goal is to find an affine SI-indicator, at every 
location £; of the program, the algorithm sets f(€;,2,71,72) := Go + Gi: 
£+ ĉi rı +6 .3°12. Similarly, if the desired degree is 6 = 2, the algorithm 
symbolically computes: f (4i, £x, r1, r2) := Go + Gi: £+ G2- ri + G3: r2 + 
Ga L? +G Leri + EotT +é: r? + E8: ri: T+: r2. Note that 
every monomial of degree at most 2 appears in this expression. The goal is 
to synthesize suitable real values for each unknown coefficient cĉ; ; such that 
f becomes an SJ-indicator. Throughout this section, we use the ~ notation 
to denote an unknown coefficient whose value will be synthesized by our 
algorithm. 

— The algorithm creates an unknown variable € whose final value will serve as 
E. 

— Finally, at each location £ of C, the algorithm sets up a template for 7(¢) in 

the exact same manner as the template for f(£). The goal is to synthesize 

values for € and the € variables in this template such that 7 becomes a valid 

e-RSM for State term U ST with respect to I. 


Step 2. Generating Entailment Constraints. In this step, the algorithm 
symbolically computes the requirements of Definition 2, i.e. C1—-C3, and their 
analogues in Definition3 using the templates generated in the previous step. 
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Note that all of these requirements are entailments between affine/polynomial 
inequalities over program variables whose coefficients are unknown. In other 
words, they are of the form Vx A(x) = b(x) where A is a set of affine/polyno- 
mial inequalities over program variables whose coefficients contain the unknown 
variables c and © generated in the previous step and b is a single such inequality. 
For example, for the program of Fig.1, the algorithm symbolically computes 
condition C; at line @; as follows: Vx I(¢1,x) > f(41,x) > 0. Assuming that 
the given invariant is I(¢,,x) := (x < 1) and an affine (degree 1) template was 
generated in the previous step, the algorithm expands this to: 


Yx l-x>036904+67-°44+069:°1+43°7T2 > 0. (5) 


The algorithm generates similar entailment constraints for every location and 
every requirement in Definitions 2 and 3. 


Step 3. Quantifier Elimination. At the end of the previous step, we have a 
system of constraints of the form A; (Vx A;(x) = b;(x)) . In this step, the algo- 
rithm sets off to eliminate the universal quantification over x in every constraint. 
First, consider the affine case. If A; is a set of linear inequalities over program 
variables and b; is one such linear inequality, then the algorithm attempts to 
write b; as a linear combination with non-negative coefficients of the inequal- 
ities in A; and the trivial inequality 1 > 0. For example, it rewrites (5) as 
ad (l-—2)+ io = 6&0 +4- r+: ri +&% 3°72 where \,’s are new non- 
negative unknown variables for which we need to synthesize non-negative real 
values. This inequality should hold for all valuations of program variables. Thus, 
we can equate the corresponding coefficients on both sides and obtain this equiv- 
alent system: 


Ja + AS = ĉio (the constant factor) 
=)= ia (coefficient of x) (6) 
0=G9=G43 (coefficients of rı and r2) 


This transformation is clearly sound, but it is also complete due to the well- 
known Farkas’ lemma [17]. Now consider the polynomial case. Again, we write 
b; as a combination of the polynomials in A;. The only difference is that instead 
of having non-negative real coefficients, we use sum-of-square polynomials as our 
multiplicands. For example, suppose our constraint is 


Yx gi(x) > 0A go(x) > 0 > gs(x) > 0, 
where the g;’s are polynomials with unknown coefficients. The algorithm writes 
g3(X) = ho(x) + hi(x) : g(x) + a(x) - g2(x), (7) 


where each h; is a sum-of-square polynomial of degree at most M. The algorithm 
sets up a template of degree M for each h; and adds well-known quadratic 
constraints that enforce it to be a sum of squares. See [2, Page 22] for details. 
It then expands (7) and equates the corresponding coefficients of the LHS and 
RHS as in the linear case. The soundness of this transformation is trivial since 
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each h; is a sum-of-squares and hence always non-negative. Completeness follows 
from Putinar’s Positivstellensatz [38]. Since the arguments for completeness of 
this method are exactly the same as the method in [2], we refer the reader 
to [2] for more details and an extension to entailments between strict polynomial 
inequalities. 


Step 4. Quadratic Programming. All of our constraints are converted to 
Quadratic Programming (QP) over template variables, e.g. see (6). Our algo- 
rithm passes this QP instance to an SMT solver or a numerical optimizer. If 
a solution is found, it plugs in the values obtained for the € and © variables 
back into the template of Step 1 and outputs the resulting termination witness 
(f,7,€)- 

We end this section by noting that our algorithm is sound and relatively 
complete for synthesizing affine /polynomial quantitative termination certificates. 


Theorem 5 (Soundness and Completeness in the Affine Case). Given 
an affine pCFG C, an affine invariant I, and a non-termination upper-bound p € 
[0,1], of C admits a quantitative termination certificate w = (f,7,€) in which both 
f and 7 are affine expressions at every location, then w corresponds to a solution 
of the QP instance solved in Step 4 of the algorithm above. Conversely, every 
such solution, when plugged back into the template of Step 1, leads to an affine 
quantitative termination certificate showing that C terminates with probability at 
least 1 — p over every scheduler. 


Theorem 6 (Soundness and Relative Completeness in the Polynomial 
Case). Given a polynomial pCFG C, a polynomial invariant I which is a compact 
subset of RIV! at every location £, and a non-termination upper-bound p € [0,1], 
if C admits a quantitative termination certificate w = (f,n,€) in which both f 
and 7 are polynomial expressions of degree at most 6 at every location, then there 
exists an M € N, for which w corresponds to a solution of the QP instance solved 
in Step 4 of the algorithm above. Conversely, every such solution, when plugged 
back into the template of Step 1, leads to a polynomial quantitative termination 
certificate of degree at most 6 showing that C terminates with probability at least 
1 — p over every scheduler. 


Proof. Step 2 encodes the conditions of an Sl-indicator (Definition 2) and RSM 
(Definition 3). Theorem 4 shows that an SI-indicator together with an RSM is a 
valid quantitative termination certificate. The transformation in Step 3 is sound 
and complete as argued in [2, Theorems 4 and 10]**. The affine version relies on 
Farkas’ lemma [17] and is complete with no additional constraints. The polyno- 
mial version is based on Putinar’s Positivstellensatz [38] and is only complete 
for large enough M, i.e. a high-enough degree for sum-of-square multiplicands. 
This is why we call our algorithm relatively complete. In practice, small values 
of M are enough to synthesize w and we use M = 2 in all of our experiments. 


** We need a more involved transformation for strict inequalities. See [2, Theorem 8]. 
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7 Experimental Results 


Implementation. We implemented a prototype of our approach in Python and 
used SymPy [33] for symbolic computations and the MathSAT5 SMT Solver [15] 
for solving the final QP instances. We also applied basic optimizations, e.g. check- 
ing the validity of each entailment and thus removing tautological constraints. 


Machine and Parameters. All results were obtained on an Intel Core i9- 
10885H machine (8 cores, 2.4GHz, 16 MB Cache) with 32 GB of RAM running 
Ubuntu 20.04. We always synthesized quadratic termination certificates and set 
6=M=2. 


Benchmarks. We generated a variety of random walks with complicated behav- 
ior, including nested combinations of probabilistic and non-deterministic branch- 
ing and loops. We also took a number of benchmarks from [14]. Due to space 
limitations, in Table 1 we only present experimental results on a subset of our 
benchmark set, together with short descriptions of these benchmarks. Complete 
evaluation as well as details on all benchmarks are provided in [12, Appendix J]. 


Results and Discussion. Our experimental results are summarized in Table 1, 
with complete results provided in [12, Appendix J]. In every case, our approach 
was able to synthesize a certificate that the program terminates with probability 
at least 1 — p under any scheduler. Moreover, our runtimes are consistently small 
and less than 6s per benchmark. Our approach was able to handle programs 
that are beyond the reach of previous methods, including those with unbounded 
differences and unbounded non-deterministic assignments to which approaches 
such as [14] and [40] are not applicable, as was demonstrated in [40]. This adds 
experimental confirmation to our theoretical power-of-completeness result at the 
end of Sect.5, which showed the wider applicability of our method. Finally, it 
is noteworthy that the termination probability lower-bounds reported in Table 1 
are not tight. There are two reasons for this. First, while our theoretical approach 
is sound and complete, our algorithm can only synthesize affine/polynomial cer- 
tificates for quantitative termination, and the best polynomial certificate of a 
certain degree might not be tight. Second, we rely on an SMT-solver to solve 
our QP instances. The QP instances often become harder as we decrease p, 
leading to the solver’s failure even though the constraints are satisfiable. 


8 Related Works 


Supermartingale-Based Approaches. In addition to qualitative and quanti- 
tative termination analyses, supermartingales were also used for the formal anal- 
ysis of other properties in probabilistic programs, such as, liveness and safety 
properties [3,8, 14,42], cost analysis of probabilistic programs [36,43]. While all 
these works demonstrate the effectiveness of supermartingale-based techniques, 
below we present a more detailed comparison with other works that consider 
automated computation of lower bounds on termination probability. 
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Table 1. Summary of our experimental results on a subset of our benchmark set. See 
[12, Appendix J] for benchmark details and for the results on all benchmarks. 


Benchmark Short explanation P LBPT Runtime 

1-p (s) 
Figure 1 Our running example 0.01 0.99 2.38 
Figure 7 Nested probabilistic and non-deterministic branches leading 0.25 0.75 1.40 

to infinite loop with maximum probability 0.25 
Figure 9 An a.s. terminating biased random walk with uniformly 0 1 0.73 
distributed steps 

Figure 10 A random walk that starts at x = 10 and takes a step of 0.12 0.88 1.10 


Uniform(—2,1) each time. Terminates if x < 0 and loops 
forever as soon as x > 100. 

Figure 11 A 2-D random walk starting at (50,50). In each iteration, x 0.07 0.93 3.52 
is incremented, while y is increased by Uniform(—1, 1). 
Terminates when x > 100. Loops when y < 0. 
Figure 14 A 3-D random walk. In each iteration, each of x,y,z are 0.999 0.001 3.22 
incremented with a higher probability than decremented. 
Terminates when «+ y+2z <0. 


Figure 15 An example with both probabilistic and non-deterministic 0.5 0.49 2.73 
assignments 

Figure 16 A variant of Fig. 15 with unbounded non-determinism in an 0.5 0.49 2.70 
assignment 

Figure 17 A probabilistic branch between an a.s. terminating loop and 0.4 0.6 5.17 

a loop with small termination probability 
Figure 18 A skewed random walk with two barriers, only one of which 0.5 0.49 5.26 
leads to program termination 
Figure 19 Taken from [14] and conceptually similar to Fig. 5 0.24 0.76 0.94 
Figure 22 A more complicated and non-a.s.-terminating random walk 0.1 0.9 1.15 
taken from [14] 
Figure 23 A 2-D variant of Fig. 22, also from [14] 0.08 0.92 4.01 


Comparison to [14]. The work of [14] introduces stochastic invariants and 
demonstrates their effectiveness for computing lower bounds on termination 
probability. However, their approach to computing stochastic invariants is based 
on repulsing supermartingales (RepSMs), and is orthogonal to ours. RepSMs 
were shown to be incomplete for computing stochastic invariants [40, Section 3]. 
Also, a RepSM is required to have bounded differences, i.e. the absolute difference 
of its value is any two successor states needs to be bounded from above by some 
positive constant. Given that the algorithmic approach of [14] computes linear 
RepSMs, this implies that the applicability of RepSMs is compromised in prac- 
tice as well, and is mostly suited to programs in which the quantity that behaves 
like a RepSM depends only on variables with bounded increments and sampling 
instructions defined by distributions of bounded support. Our approach does not 
impose such a restriction, and is the first to provide completeness guarantees. 


Comparison to [40]. The work of [40] introduces y-scaled submartingales and 
proves their effectiveness for computing lower bounds on termination probability. 
Intuitively, for y € (0,1), a state function f is a 7-scaled submartingale if it is a 
bounded nonnegative function whose value in each non-terminal state decreases 
in expected value at least by a factor of y upon a one-step execution of the 
pCFG. One may think of the second condition as a multiplicative decrease in 
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expected value. However, this condition is too strict and y-scaled submartingales 
are not complete for lower bounds on termination probability [40, Example 6.6]. 


Comparison to [5]. The work of [5] proposes a type system for functional 
probabilistic programs that allows incrementally searching for type derivations 
and accumulating a lower bound on termination probability. In the limit, it finds 
arbitrarily tight lower bounds on termination probability, however it does not 
provide any completeness or precision guarantees in finite time. 


Other Approaches. Logical calculi for reasoning about properties of probabilis- 
tic programs (including termination) were studied in [18, 19,29] and extended to 
programs with non-determinism in [27,28,31,37]. These works consider proof 
systems for probabilistic programs based on the weakest pre-expectation cal- 
culus. The expressiveness of this calculus allows reasoning about very complex 
programs, but the proofs typically require human input. In contrast, we aim for a 
fully automated approach for probabilistic programs with polynomial arithmetic. 
Connections between martingales and the weakest pre-expectation calculus were 
studied in [24]. A sound approach for proving almost-sure termination based on 
abstract interpretation is presented in [34]. 


Cores in MDPs. Cores are a conceptually equivalent notion to stochastic 
invariants introduced in [30] for finite MDPs. [30] presents a sampling-based 
algorithm for their computation. 


9 Conclusion 


We study the quantitative probabilistic termination problem in probabilistic pro- 
grams with non-determinism and propose the first relatively complete algorithm 
for proving termination with at least a given threshold probability. Our approach 
is based on a sound and complete characterization of stochastic invariants via 
the novel notion of stochastic invariant indicators, which allows for an effective 
and relatively complete algorithm for their computation. We then show that 
stochastic invariants are sound and complete certificates for proving that a pro- 
gram terminates with at least a given threshold probability. Hence, by combining 
our relatively complete algorithm for stochastic invariant computation with the 
existing relatively complete algorithm for computing ranking supermartingales, 
we present the first relatively complete algorithm for probabilistic termination. 
We have implemented a prototype of our algorithm and demonstrate its effec- 
tiveness on a number of probabilistic programs collected from the literature. 
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Abstract. We study discrete probabilistic programs with potentially 
unbounded looping behaviors over an infinite state space. We present, 
to the best of our knowledge, the first decidability result for the prob- 
lem of determining whether such a program generates exactly a specified 
distribution over its outputs (provided the program terminates almost- 
surely). The class of distributions that can be specified in our formalism 
consists of standard distributions (geometric, uniform, etc.) and finite 
convolutions thereof. Our method relies on representing these (possibly 
infinite-support) distributions as probability generating functions which 
admit effective arithmetic operations. We have automated our techniques 
in a tool called PRODIGY, which supports automatic invariance checking, 
compositional reasoning of nested loops, and efficient queries to the out- 
put distribution, as demonstrated by experiments. 


Keywords: Probabilistic programs - Quantitative verification - 
Program equivalence - Denotational semantics - Generating functions 


1 Introduction 


Probabilistic programs [26,43,48] augment deterministic programs with stochas- 
tic behaviors, e.g., random sampling, probabilistic choice, and conditioning (via 
posterior observations). Probabilistic programs have undergone a recent surge 
of interest due to prominent applications in a wide range of domains: they 
steer autonomous robots and self-driving cars [20,54], are key to describe secu- 
rity [6] and quantum [61] mechanisms, intrinsically code up randomized algo- 
rithms for solving NP-hard or even deterministically unsolvable problems (in, 
e.g., distributed computing [2,53]), and are rapidly encroaching on AI as well 


This research was funded by the ERC Advanced Project FRAPPANT under grant No. 
787914, by the EU’s Horizon 2020 research and innovation programme under the Marie 
Skłodowska-Curie grant No. 101008233, and by the DFG RTG 2236 UnRAVeL. 


© The Author(s) 2022 
S. Shoham and Y. Vizel (Eds.): CAV 2022, LNCS 13371, pp. 79-101, 2022. 
https: //doi.org/10.1007/978-3-031-13185-1_5 


80 M. Chen et al. 


as approximate computing [13]. See [5] for recent advancements in probabilistic 
programming. 

The crux of probabilistic programming, a la Hicks’ interpretation [30], is to 
treat normal-looking programs as if they were probability distributions. A random- 
number generator, for instance, is a probabilistic program that produces a uni- 
form distribution across numbers from a range of interest. Such a lift from deter- 
ministic program states to possibly infinite-support distributions (over states) 
renders the verification problem of probabilistic programs notoriously hard [39]. 
In particular, reasoning about probabilistic loops often amounts to computing 
quantitative fixed-points which are highly intractable in practice. As a conse- 
quence, existing techniques are mostly concerned with approximations, i.e., they 
strive for verifying or obtaining upper and/or lower bounds on various quantities 
like assertion-violation probabilities [59], preexpectations [9,28], moments [58], 
expected runtimes [40], and concentrations [15,16], which reveal only partial 
information about the probability distribution carried by the program. 

In this paper, we address the problem of how to determine whether a (possibly 
infinite-state) probabilistic program yields exactly the desired (possibly infinite- 
support) distribution under all possible inputs. We highlight two scenarios where 
encoding the exact distribution — other than (bounds on) the above-mentioned 
quantities — is of particular interest: (I) In many safety- and/or security-critical 
domains, e.g., cryptography, a slightly perturbed distribution (while many of its 
probabilistic quantities remain unchanged) may lead to significant attack vul- 
nerabilities or even complete compromise of the cryptographic system, see, e.g., 
Bleichenbacher’s biased-nonces attack [29, Sect.5.10] against the probabilistic 
Digital Signature Algorithm. Therefore, the system designer has to impose a 
complete specification of the anticipated distribution produced by the proba- 
bilistic component. (II) In the context of quantitative verification, the user may 
be interested in multiple properties (of different types, e.g., the aforementioned 
quantities) of the output distribution carried by a probabilistic program. In 
absence of the exact distribution, multiple analysis techniques — tailored to dif- 
ferent types of properties — have to be applied in order to answer all queries from 
the user. We further motivate our problem using a concrete example as follows. 


Example 1 (Photorealistic Rendering [37|). Monte Carlo integration algorithms 
form a well-known class of probabilistic programs which approximate complex 
integral expressions by sampling [27]. One of its particular use-cases is the pho- 
torealistic rendering of virtual scenes by a technique called Monte Carlo path 
tracing (MCPT) [87]. 

MCPT works as follows: For every pixel of the output image, it shoots n 
sample rays into the scene and models the light transport behavior to approx- 
imate the incoming light at that particular point. Starting from a certain pixel 
position, MCPT randomly chooses a direction, traces it until a scene object is 
hit, and then proceeds by either (i) terminating the tracing and evaluating the 
overall ray, or (ii) continuing the tracing by computing a new direction. In the 
physical world, the light ray may be reflected arbitrarily often and thus stop- 
ping the tracing after a certain amount of bounces would introduce a bias in the 
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while (n > 0) { 
running := 13 
while (running = 1) { 
{running := 0 y E/2] {c := c+1 } hs 
n:=n-—1 } 


Fig. 1. Monte Carlo path tracing in a scene with constant reflectivity 1/2. 


integral estimation. As a remedy, the decision when to stop the tracing is made 
in a Russian roulette manner by flipping a coin! at each intersection point [1]. 
The program in Fig. 1 is an implementation of a simplified MCPT path gen- 
erator. The cumulative length of all n rays is stored in the (random) variable c, 
which is directly proportional to MCPT’s expected runtime. The implementation 
is designed in a way that c induces a distribution as the sum of n independent and 
identically distributed (i.i.d.) geometric random variables such that the resulting 
integral estimation is unbiased. In our framework, we view such an exact output 
distribution of c as a specification and verify — fully automatically — that the 
implementation in Fig. 1 with nested loops indeed satisfies this specification. < 


Approach. Given a probabilistic loop L = while (vy) {P} with guard y and 
loop-free body P, we aim to determine whether L agrees with a specification S: 


L=while(y){P} ~ 5, (x) 


namely, whether L yields — upon termination — exactly the same distribution 
as encoded by S under all possible program inputs. This problem is non-trivial: 
(C1) L may induce an infinite state space and infinite-support distributions, thus 
making techniques like probabilistic bounded model checking [34] insufficient for 
verifying the property by means of unfolding the loop L. (C2) There is, to the 
best of our knowledge, a lack of non-trivial characterizations of L and S such 
that problem (x) admits a decidability result. (C3) To decide problem (x) — even 
for a loop-free program L — one has to account for infinitely or even uncountably 
many inputs such that L yields the same distribution as encoded by S when 
being deployed in all possible contexts. 

We address challenge (C1) by exploiting the forward denotational seman- 
tics of probabilistic programs based on probability generating function (PGF) 
representations of (sub-)distributions [42], which benefits crucially from closed- 
form (i.e., finite) PGF representations of possibly infinite-support distributions. 
A probabilistic program L hence acts as a transformer [L](-) that transforms 
an input PGF g into an output PGF [Z](g) (as an instantiation of Kozen’s 


1 The bias of the coin depends on the material’s reflectivity: a reflecting material such 
as a mirror requires more light bounces than an absorptive one, e.g., a black surface. 
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transformer semantics [43]). In particular, we interpret the specification S as 
a loop-free probabilistic program I. Such an identification of specifications with 
programs has two important advantages: (i) we only need a single language to 
encode programs as well as specifications, and (ii) it enables compositional rea- 
soning in a straightforward manner, in particular, the treatment of nested loops. 
The problem of checking L ~ S then boils down to checking whether L and I 
transform every possible input PGF into the same output PGF: 


Vg € PGF: [while(y){P}](9) = [I(9)- (t) 


— 
L 


As I is loop free, problem (ft) can be reduced to checking the equivalence of two 
loop-free probabilistic programs (cf. Lemma 2): 


VgEPGF: [if (p) {P3T}else {skip}](g) = HJ(9) . (H 


Now challenge (C3) applies since the universal quantification in problem (t) 
requires to determine the equivalence against infinitely many — possibly infinite- 
support — distributions over program states. We facilitate such an equivalence 
checking by developing a second-order PGF (SOP) semantics for probabilistic 
programs, which naturally extends the PGF semantics while allowing to reason 
about infinitely many PGF transformations simultaneously (see Lemma 3). 

Finally, to obtain a decidability result (cf. challenge (C2)), we develop the 
rectangular discrete probabilistic programming language (ReDiP) — a variant of 
pGCL [46] with syntactic restrictions to rectangular guards — featuring various 
nice properties, e.g., they inherently support i.i.d. sampling, and in particular, 
they preserve closed-form PGF when acting as PGF transformers. We show 
that problem (t) is decidable for ReDiP programs P and I if all the distribution 
statements therein have rational closed-form PGF (cf. Lemma4). As a conse- 
quence, problem (ft) and thereby problem (x) of checking L ~ S are decidable if 
L terminates almost-surely on all possible inputs g (cf. Theorem 4). 


Demonstration. We have automated our techniques in a tool called PRODIGY. As 
an example, PRODIGY was able to verify, fully automatically in 25 milliseconds, 
that the implementation of the MCPT path generator with nested loops (in 
Fig. 1) is indeed equivalent to the loop-free program 


c += iid(geometric(!/2), n)$n:= 0 


which encodes the specification that, upon termination, c is distributed as the 
sum of n i.i.d. geometric random variables. With such an output distribution, 
multiple queries can be efficiently answered by applying standard PGF opera- 
tions. For example, the expected value and variance of the runtime are Efc] = n 
and Var|c] = 2n, respectively (assuming c = 0 initially). 
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Contributions. The main contributions of this paper are: 


— The probabilistic programming language ReDiP and its forward denotational 
semantics as PGF transformers. We show that loop-free ReDiP programs pre- 
serve closed-form PGF. 

— The notion of SOP that enables reasoning about infinitely many PGF trans- 
formations simultaneously. We show that the problem of determining whether 
an infinite-state ReDiP loop generates — upon termination — exactly a specified 
distribution is decidable. 

— The software tool PRODIGY which supports automatic invariance checking 
on the source-code level; it allows reasoning about nested ReDiP loops in 
a compositional manner, and supports efficient queries on various quanti- 
ties including assertion-violation probabilities, expected values, (high-order) 
moments, precise tail probabilities, as well as concentration bounds. 


Organization. We introduce generating functions in Sect. 2 and define the ReDiP 
language in Sect.3. Section 4 presents the PGF semantics. Section 5 establishes 
our decidability result in reasoning about ReDiP loops, with case studies in 
Sect. 6. After discussing related work in Sect. 7, we conclude the paper in Sect. 8. 
Further details, e.g., proofs and additional examples, can be found in the full 
version [18]. 


2 Generating Functions 


“A generating function is a clothesline on which we hang up a sequence 
of numbers for display.” — H. S. Wilf, Generatingfunctionology [60] 


The method of generating functions (GF) is a vital tool in many areas of math- 
ematics. This includes in particular enumerative combinatorics [22,60] and — 
most relevant for this paper — probability theory [35]. In the latter, the sequences 
“hanging on the clotheslines” happen to describe probability distributions over 
the non-negative integers N, e.g., 1/2, 1/4, 1/s,... (aka, the geometric distribution). 

The most common way to relate an (infinite) sequence of numbers to a gen- 
erating function relies on the familiar Taylor series expansion: Given a sequence, 


for example 1/2, 1/4,1/s,..., find a function x +> f(x) whose Taylor series around 
x = 0 uses the numbers in the sequence as coefficients. In our example, 
s ee ee E E ee (1) 


2-2 2 4 8 16 32 


for all |z| < 2, hence the “clothesline” used for hanging up 1/2, 1/4,1/s,... is the 
function 1/(2 — x). Note that the GF is a — from a purely syntactical point of 
view — finite object while the sequence it represents is infinite. A key strength 
of this technique is that many meaningful operations on infinite series can be 
performed by manipulating an encoding GF (see Table1 for an overview and 
examples). In other words, GF provide an interface to perform operations on 
and extract information from infinite sequences in an effective manner. 
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2.1 The Ring of Formal Power Series 


Towards our goal of encoding distributions over program states (valuations of 
finitely many integer variables) as generating functions, we need to consider 
multivariate GF, i.e., GF with more than one variable. Such functions repre- 
sent multidimensional sequences, or arrays. Since multidimensional Taylor series 
quickly become unhandy, we will follow a more algebraic approach that is also 
advocated in [60]: We treat sequences and arrays as elements from an algebraic 
structure: the ring of Formal Power Series (FPS). Recall that a (commutative) 
ring (A,+,-,0,1) consists of a non-empty carrier set A, associative and com- 
mutative binary operations “+” (addition) and “-” (multiplication) such that 
multiplication distributes over addition, and neutral elements 0 and 1 w.r.t. 
addition and multiplication, respectively. Further, every a € A has an additive 
inverse —a € A. Multiplicative inverses a~! = 1/a need not always exist. Let 
k € N = {0,1,...} be fixed in the remainder. 


Table 1. GF cheat sheet. f,g and X,Y are arbitrary GF and indeterminates, resp. 


Operation Effect (Running) example 
f~'=1/f Multiplicative inverse of f yr =14XY+4+X°?Y?4+... 
(if it exists) because (1 — XY)(1+ XY4X2Y?24...)=1 
fX Shift in dimension X iy OE Xt X?V 4 X8Y72 +... 
f[X/0] Drop terms containing X or =1 
f[X/1] Projection® on Y wr =14+Y4+Y?4... 
fg Discrete convolution a XY)? 1+2XY +3X?Y? 4 
(or Cauchy product) 
Oxf Formal derivative in X Ox kr = OXY =Y +2XY? +3X?Y? +... 
Í+g Coefficient-wise sum ky + OXY = EES SONI = 
2+3XY+4X?Y?+... 
af Coefficient-wise scaling OFP =7+14XY +21X?Y? +... 
a Projections are not always well-defined, e.g., rx X/1 = + is ill-defined because Y is 


not invertible. However, in all situations where we use projection it will be well-defined; in 
particular, projection is well-defined for PGF. 


Definition 1 (The Ring of FPS). A k-dimensional FPS is a k-dim. array 
f: N‘ — R. We denote FPS as formal sums as follows: Let X=(X1,..., Xp) be 
an ordered vector of symbols, called indeterminates. The FPS f is written as 


f= J ee OX 


where X7 is the monomial XF X3’ --- Xz". The ring of FPS is denoted R|[X]] 
where the operations are defined as follows: For all f,g € R|[X]] and o € N*, 


(f +g)(@) = f(a) + g(a), and (F: 9)(0) = Xo, t02=0 F (01)9(02). 


The multiplication f - g is the usual Cauchy product of power series (aka 
discrete convolution); it is well defined because for all o € N* there are just 
finitely many c1 + 02 = o in N*. We write fg instead of f - g. 
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The formal sum notation is standard in the literature and often useful because 
the arithmetic FPS operations are very similar to how one would do calculations 
with “real” sums. We stress that the indeterminates X are merely labels for 
the k dimensions of f and do not have any other particular meaning. In the 
context of this paper, however, it is natural to identify the indeterminates with 
the program variables (e.g. indeterminate X refers to variable x, see Sect. 3). 

Equation (1) can be interpreted as follows in the ring of FPS: The “sequences” 
2— 1X + 0X? +... and 1/24 1/4X + 1/sX?4+... are (multiplicative) inverse 
elements to each other in R[[X]], i.e., their product is 1. More generally, we say 
that an FPS f is rational if f = gh~! = g/h where g and h are polynomials, 
i.e., they have at most finitely many non-zero coefficients; and we call such a 
representation a rational closed form. 

A more extensive introduction to FPS can be found in [18, Appx. D]. 


2.2 Probability Generating Functions 
We are especially interested in GF that describe probability distributions. 


Definition 2 (PGF). A k-dimensional FPS g is a probability generating func- 
tion (PGF) if (i) for allo € N! we have g(c) > 0, and (ii) yeux (0) < 1. 


For example, (1) is the PGF of a 1/2-geometric distribution. The PGF of other 
standard distributions are given in Table 3 further below. Note that Definition 2 
also includes sub-PGF where the sum in (ii) is strictly less than 1. 


3  ReDiP: A Probabilistic Programming Language 


This section presents our Rectangular Discrete Probabilistic Programming Lan- 
guage, or ReDiP for short. The word “rectangular” refers to a restriction we 
impose on the guards of conditionals and loops, see Sect. 3.2. ReDiP is a variant 
of pGCL [46] with some extra syntax but also some syntactic restrictions. 


3.1 Program States and Variables 


Every ReDiP-program P operates on a finite set of N-valued program variables 
Vars(P) = {x1,...,x}. We do not consider negative or non-integer variables. A 
program state of P is thus a mapping o: Vars(P) — N. As explained in Sect. 1, 
the key idea is to represent distributions over such program states as PGF. 
Consequently, we identify a single program state o with the monomial X” = 
X7 OD ox, en where X,,...,X, are indeterminates representing the program 
variables x;,...,Xz. We will stick to this notation: throughout the whole paper, 
we typeset program variables as x and the corresponding FPS indeterminate as 
X. The initial program state on which a given ReDiP-program is supposed to 
operate must always be stated explicitly. 


86 M. Chen et al. 


3.2 Syntax of ReDiP 


The syntax of ReDiP is defined inductively, see the leftmost column of Table 2. 
Here, x and y are program variables, n € N is a constant, D is a distribution expres- 
sion (see Table 3), and P,, P) are ReDiP-programs. The general idea of ReDiP is to 
provide a minimal core language to keep the theory simple. Many other common 
language constructs such as linear arithmetic updates x := 2y + 3 are expressible 
in this core language. See [18, Appx. A] for a complete specification. 


Table 2. Syntax and semantics of ReDiP. g is the input PGF. 


ReDiP-program P Semantics [P](g) — see Sect. 4.2 Description 
= g|X/1] X” Assign const. n € N to var. x 
x—— (g — g X/0])X-+ + g[X/0] Decr. x (“monus” semantics) 
x += iid(D, y) gl¥/Y [D][T/X]] Incr. x by the sum of y iid. 
samples from D — see 
Sect. 3.3 
if (x < n) {Pi} [Pil(gr<n) + [Pe] (g -— g2<n), where Conditional branching 
else {P2} ge<n = Dizo TOK GIX/OX* 
P13 Po [Pe] (LPil(g)) Sequential composition 
while (x<n){Pi} [Ifp Yen,p, | (g), where Loop defined as fixed point 
Wen, Py (4) = 


Af. (f—fe<n)+0(1Pil fe<n)) 


Table 3. A non-exhaustive list of common discrete distributions with rational PGF. 
The parameters p, n, and À are a probability, a natural, and a non-negative real number, 
respectively. T is a reserved placeholder indeterminate. 


D [P] Description 

dirac(n) I Bea Point mass 

bernoulli(p) l—p+pT Bernoulli distribution (coin flip) 

unif(n) (1-—T")/n(1—T) Discrete uniform distribution on {0,...,n—1} 
geometric(p) (1-p)/(1- pT) Geometric distribution (no. trials until first success) 
binomial(p,n) (1—p+pT)” Binomial distribution (successes of n yes-no experiments) 


nbinomial(p, n) (1— p)” /(1— pT)” Negative binomial distribution 


The word “rectangular” in ReDiP emphasizes that our if-guards can only 
identify azis-aligned hyper-rectangles? in N*, but no more general polyhedra. 
These rectangular guards x < n have the fundamental property that they pre- 
serve rational PGF. On the other hand, allowing more general guards like x < y 
breaks this property (see [21] and our comments in [18, Appx. B]. 

The most intricate feature of ReDiP is the — potentially unbounded — loop 
while (x < n) {P}. A program that does not contain loops is called loop-free. 


? More precisely, we can simulate statements like if (R) {...} else {...}, where R is 
a finite Boolean combination of rectangular guards, through appropriate nesting of 
if () ; note that such an R is indeed a finite union of axis-aligned rectangles in N”. 
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3.3 The Statement x += iid(D, y) 


The novel iid statement is the heart of the loop-free fragment of ReDiP — it 
subsumes both x := D (“assign a D-distributed sample to x”) and the standard 
assignment x := y. We include the assign-increment (+=) version of iid in the 
core fragment of ReDiP for technical reasons; the assignment x := iid(D, y) can 
be recovered from that as syntactic sugar by simply setting x := 0 beforehand. 

Intuitively, the meaning of x += iid(D, y) is as follows. The right-hand side 
iid(D, y) can be seen as a function that takes the current value v of variable 
y, then draws v i.i.d. samples from distribution D, computes the sum of all 
these samples and finally increments x by the so-obtained value. For example, 
to perform x := y, we may just write x := iid(dirac(1), y) as this will draw 
y times the number 1, then sum up these y many 1’s to obtain the result y 
and assign it to x. Similarly, to assign a random sample from a, say, uniform 
distribution to x, we can execute y := 1 3 x := iid(unif(n), y). 

But iid is not only useful for defining standard operations. In fact, taking 
sums of i.i.d. samples is common in probability theory. The binomial distribution 
with parameters p € (0,1) and n € N, for example, is the defined as the sum of 
n iid. Bernoulli-p-distributed samples and thus 


x := binomial(p, y) is equivalent to x := iid(bernoulli(p), y) 


for all constants p € (0,1). Similarly, the negative (p,n)-binomial distribution 
is the sum of n i.i.d. geometric-p-distributed samples. Overall, iid renders the 
loop-free fragment of ReDiP strictly more expressive than it would be if we had 
included only x := D and x := y instead. As a consequence, since we use loop- 
free programs as a specification language (see Sect.5), iid enables us to write 
more expressive program specifications while retaining decidability. 


4 Interpreting ReDiP with PGF 


In this section, we explain the PGF-based semantics of our language which is 
given in the second column of Table 2. The overall idea is to view a ReDiP- 
program P as a distribution transformer [44,46]. This means that the input to 
P is a distribution over initial program states (inputting a deterministic state 
is just the special case of a Dirac distribution), and the output is a distribution 
over final program states. With this interpretation, if one regards distributions 
as generalized program states [33], a probabilistic program is actually determinis- 
tic: The same input distribution always yields the same output distribution. The 
goal of our PGF-based semantics is to construct an interpreter that executes a 
ReDiP-program statement-by-statement in forward direction, transforming one 
generalized program state into the next. We stress that these generalized pro- 
gram states, or distributions, can be infinite-support in general. For example, 
the program x := geometric(0.5) outputs a geometric distribution — which has 
infinite support — on x. 
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4.1 A Domain for Distribution Transformation 


We now define a domain, i.e., an ordered structure, where our program’s in- and 
output distributions live. Following the general idea of this paper, we encode 
them as PGF. Let Vars be a fixed finite set of program variables x;,...,x, and 
let X = (X1,..., Xk) be corresponding formal indeterminates. We let PGF = 
{g € R{[X]] | g is a PGF} denote the set of all PGF. Recall that this also includes 
sub-PGF (Definition 2). Further, we equip PGF with the pointwise order, i.e., we 
let g E f iff g(a) < f(c) for all o € N*. It is clear that (PGF, C) is a partial order 
that is moreover w-complete, i.e., there exists a least element 0 and all ascending 
chains I = {go E gi E ...} in PGF have a least upper bound sup I’ € PGF. The 
maxima in (PGF,C) are precisely the PGF which are not a sub-PGF. 


4.2 From Programs to PGF Transformers 


Next we explain how distribution transformation works using (P)GF (cf. 
Table 1). This is in contrast to the PGF semantics from [42] which operates 
on infinite sums in a non-constructive fashion. 


Definition 3 (The PGF Transformer |P]). Let P be a ReDiP-program. The 
PGF transformer |P]: PGF — PGF is defined inductively on the structure of P 
through the second column in Table 2. 


We show in Theorem 2 below that |P] is well-defined. For now, we go over 
the statements in the language ReDiP and explain the semantics. 


Sequential Composition. The semantics of P,P, is straightforward and intuitive: 
First execute P) on g and then P on [P,](q), i-e., [P13 Pe](g) = [Pe] ([Pi](9))- 
The fact that our semantics transformer moves forwards through the program — 
as program interpreters usually do — is due to this definition. 


Conditional Branching. To translate if (x < n) {Pi} else {Pz}, we follow the 
standard procedure which partitions the input distribution according to x < 
n and x > n, processes the two parts independently and finally recombines 
the results [44]. We realize the partitioning using the (formal) Taylor series 
expansion. This is feasible because we only allow rectangular guards of the form 
x < n, where n is a constant. Thus, for a given input PGF g, the filtered PGF 
9x<n is obtained through expanding g in its first n terms. The else -part is 
obviously gx>n = 9—Gx<n- We then evaluate [P;](gx<n)+[P2] (9x>n) recursively. 
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Assigning a Constant. Technically, our semantics realizes an assignment x := n 
in two steps: It first sets x to 0 and then increments it by n. The former is 
achieved by substituting X for 1 which corresponds to computing the marginal 
distribution in all variables except X. For example, 


H 0.5XY? +0.5X?Y? Mg 

x := 5 P 

M (0.5Y? +0.5Y°) X5 M IPIC) 

M 0.5X°Y? +0.5X°Y? I] (reform. of prev. line) 


where the rightmost four lines explain this annotation style [42]. Note that 
0.5Y? + 0.5Y? is indeed the marginal of the input distribution in Y. 


Decrementing a Variable. Since our program variables cannot take negative val- 
ues, we define x—— as max(x—1,0), i.e., x monus (modified minus) 1. Technically, 
we realize this through if (x < 1) {skip} else {x——}, i.e., we apply the decre- 
ment only to the portion of the input distribution where x > 1. The decrement 
itself can then be carried out through “multiplication by X~!”. Note that X~! 
is not an element of R[[X]] because X has no inverse. Instead, the operation 
gX~' is an alias for shift (g) which shifts g “to the left” in dimension X. To 
implement the semantics on top of existing computer algebra software, it is very 
handy to perform the multiplication by X~! instead. This is justified because 
for PGF g with g|X/0] = 0, shift (g) and gX~! are equal. 


The iid Statement. The semantics of x += iid(D, y) relies on the fact that 
Tı ~ [D] ... Ta~ [D] impies = SO" ~ [D]" (2) 


where X ~ g means that r.v. X is distributed according to PGF g (see, e.g., |55, 
p. 450]). The iid statement generalizes this observation further: If n is not a 
constant but a random (program) variable y with PGF h(Y), then we perform 
the substitution h|Y/[D]] (i.e., replace Y by [D] in h) to obtain the PGF of the 
sum of y-many i.i.d. samples from D. We slightly modify this substitution to 
g[Y/Y | D][T/X]] in order to (i) not alter y, and (ii) account for the increment 
to x. For example, 


ff 0.2 +0.3Y +0.5Y? 

x += iid(bernoulli(0.5), y) 

M 0.2 + 0.3Y (0.5 + 0.5X) + 0.5Y?°(0.5 +0.5X)? 

M 0.2+0.15Y + 0.125Y? + 0.15XY + 0.25XY?° + 0.125X°Y? . 


The while-Loop. The fixed point semantics of the while loop is standard [42, 44] 
and reflects the intuitive unrolling rule, namely that while (Y) {P} is equivalent 
to if (y) {P3 while (p) {P}} else {skip}. Indeed, the fixed point formula in 
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Table 2 can be derived using the semantics of if discussed above. We revisit this 
fixed point characterization in Sect. 5.1. 


Properties of [P]. Our PGF semantics has the property that all programs — 
except while loops — are able to operate on the input PGF in (rational) closed 
form, i.e., they never have to expand the input as an infinite series (which is of 
course impossible in practice). More formally: 


Theorem 1 (Closed-Form Preservation). Let P be a loop-free ReDiP pro- 
gram, and let g = h/f € PGF be in rational closed form. Then we can compute a 
rational closed form of |P](g) € PGF by applying the transformations in Table 2. 


The proof is by induction over the structure of P noticing that all the nec- 
essary operations (substitution, differentiation, etc.) preserve rational closed 
forms, see [18, Appx. D]. A slight extension of our syntax, e.g., admitting 
non-rectangular guards, renders that closed forms are not preserved, see [18, 
Appx. B]. Moreover, |P] has the following healthiness [46] properties: 


Theorem 2 (Properties of [P]). The PGF transformer [P] is 


- a well-defined function PGF — PGF , 
— continuous, i.e., |P](sup T) = sup] P](L) for all chains IT C PGF , 
— linear, i.e., [P] Oc cene 9(07)X7) = X oen: g(o)P](X°) for all g € PGF. 


4.3 Probabilistic Termination 


Due to the presence of possibly unbounded while-loops, a ReDiP-program does 
not necessarily halt, or may do so only with a certain probability. Our semantics 
naturally captures the termination probability. 


Definition 4 (AST). A ReDiP-program P is called almost-surely terminating 
(AST) for PGF g if [P](g)[X/1] = g[X/1], i.e., if it does not leak probability 
mass. P is called universally AST (UAST) if it is AST for all g © PGF. 


Note that all loop-free ReDiP-programs are UAST. In this paper, (U) AST 
only plays a minor role. Nonetheless, the proof rule below yields a stronger 


result (cf. Lemma 2) if the program is UAST. There exist various of techniques 
and tools for proving (U)AST [17, 47,50]. 


5 Reasoning About Loops 


We now focus on loopy programs L = while (p) {P}. Recall from Table 2 that 
|L]: PGF — PGF is defined as the least fixed point of a higher order functional 


W, p: (PGF + PGF) — (PGF — PGF). 


Following [42], we show that W, p is sufficiently well-behaved to allow reasoning 
about loops by fixed point induction. 
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5.1 Fixed Point Induction 


To apply fixed point induction, we need to lift our domain PGF from Sect. 4.1 by 
one order to (PGF — PGF), the domain of PGF transformers. This is because the 
functional YW, p operates on PGF transformers and can thus be seen as a second- 
order function (this point of view regards PGF as first-order objects). Recall that 
in contrast to this, the function [|P] is first-order — it is just a PGF transformer. 
The order on (PGF — PGF) is obtained by lifting the order E on PGF pointwise 
(we denote it with the same symbol E). This implies that (PGF — PGF) is 
also an w-complete partial order. We can then show that W, p (see Table 2) is 
a continuous function. With these properties, we obtain the following induction 
rule for upper bounds on [L], cf. [42, Theorem 6]: 


Lemma 1 (Fixed Point Induction for Loops). Let L = while (vy) {P} be 
a ReDiP-loop. Further, let y: PGF — PGF be a PGF transformer. Then 


Popy) Cv implies [I] E 4. 


The goal of the rest of the paper is to apply the rule from Lemma 1 in practice. 
To this end, we must somehow specify an invariant such as w by finite means. 
Since ~ is of type (PGF — PGF), we consider w as a program J — more specifically, 
a ReDiP-program — and identify = [J]. Further, by definition 


Yo PHI = [it (y) {P31} else {skip}], 


and thus the term W, p([J]) is also a PGF-transformer expressible as a ReDiP- 
program. These observations and Lemma 1 imply the following: 


Lemma 2. Let L = while (p) {P} and I be ReDiP-programs. Then 


[if (py) {P 3 I} else {skip}] E [J] implies [Z] C [J]. (3) 
Further, if L is UAST (Definition 4), then 
[if (Y) {P3 I} else {skip}] = H] iff [Z] = H] (4) 


Lemma 2 effectively reduces checking whether Y given as a ReDiP-program I 
is an invariant of L to checking equivalence of if (p) {P 3 I} else {skip} and 
I provided L is UAST. If I is loop-free, then the latter two programs are both 
loop-free and we are left with the task of proving whether they yield the same 
output distribution for all inputs. We now present a solution to this problem. 


5.2 Deciding Equivalence of Loop-free Programs 


Even in the absence of loops, deciding if two given ReDiP-programs are equivalent 
is non-trivial as it requires reasoning about infinitely many — possibly infinite- 
support — distributions on program variables. In this section, we first show that 
[Pi] = [P2] is decidable for loop-free ReDiP programs P, and P2, and then use 
this result together with Lemma 2 to obtain the main result of this paper. 
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SOP: Second-Order PGF. Our goal is to check if [Pi](g) = | P2](g) for all 
g € PGF. To tackle this, we encode whole sets of PGF into a single object — 
an FPS we call second-order PGF (SOP). To define SOP, we need a slightly 
more flexible view on FPS. Recall from Definition 1 that a k-dim. FPS is an 
array f: N* — R. Such an f can be viewed equivalently as an /-dim. array with 
(k—l)-dim. arrays as entries. In the formal sum notation, this is reflected by 
partitioning X = (Y, Z) and viewing f as an FPS in Y with coefficients that are 
FPS in the other indeterminates Z. For example, 


se ee A a ae hu 
= (= 27)" 20-2) + 2) +... 


a-Y)70-zZ)" 


where in the lower line the coefficients (1—Z)~! are considered elements in R[[Z]]. 


Definition 5 (SOP). Let U and X be disjoint sets of indeterminates. A formal 
power series f € R{[U, X]] is a second-order PGF (SOP) if 


f= aa f(r)U" (with f(r) € RI[X]]) implies Vr: f(r) € PGF. 
That is, an SOP is simply an FPS whose coefficients are PGF — instead of 


generating a sequence of probabilities as PGF do, it generates a sequence of 
distributions. An (important) example SOP is 


fairac = (1— XU)! = 1 t XU t Aue Fives R[[U, X]], (5) 


i.e., for all i > 0, fairac(i) = X* = [dirac(i)]. As a second example consider 
foinom = fairac[X/0.5 +0.5X]; it is clear that foinom(i) = (0.5 + 0.5X) = 
[binomial(0.5, i)] for alli > 0. Note that if U = 0, then SOP and PGF coincide. 
For fixed X and U, we denote the set of all second-order PGF with SOP. 


SOP Semantics of ReDiP. The appeal of SOP is that, syntactically, they are 
still formal power series, and some can be represented in closed form just like 
PGF. Moreover, we can readily extend our PGF transformer |P] to an SOP 
transformer |P]: SOP — SOP. A key insight of this paper is that — without any 
changes to the rules in Table 2 — applying |P] to an SOP is the same as applying 
[P] simultaneously to all the PGF it subsumes: 


Theorem 3. Let P be a ReDiP-program. The transformer |P]: SOP — SOP is 
well-defined. Further, if f = Yo eniui f(T)U" is an SOP, then 


PIA) = qu PIEU - 


An SOP Transformation for Proving Equivalence. We now show how to 
exploit Theorem 3 for equivalence checking. Let P, and P> be (loop-free) ReDiP- 
programs; we are interested in proving whether [Pi] = [P2]. By linearity it 
holds that [Pi] = [P] iff [P:](X7) = [P2](X7) for all o € NF, i.e., to check 
equivalence it suffices to consider all (infinitely many) point-mass PGF as inputs. 
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Lemma 3 (SOP-Characterisation of Equivalence). Let P) and Pz be 
ReDiP-programs with Vars(P;) C {x1,...,X«} fori € {1,2}. Further, consider a 
vector U = (U1,...,Up) of meta indeterminates, and let gx be the SOP 


gx = (1—X1U,)“1(1 — XoUe)*---(1 — XU)! € RU, X]] . 
Then [Pi] = [P2] if and only if [Pi](gx) = [Pol (gx). 


The proof of Lemma3 (see [18, Appx. F.5]) relies on Theorem3 and the fact 
that the rational SOP gx generates all (multivariate) point-mass PGF; in fact 
it holds that gx = „en: X7U", i.e., gx generalizes fairac from (5). It follows: 


Lemma 4. [Pi] = [P2] is decidable for loop-free ReDiP-programs P,, P2. 
Our main theorem follows immediately from Lemmas 2 and 4: 


Theorem 4. Let L = while(y){P} be UAST with loop-free body P and I be a 
loop-free ReDiP-program. It is decidable whether [L] = [J]. 


Example 2. In Fig. 2 we prove that the two UAST programs L and I 


while (n > 0) { c += iid(geometric(1/2), n)3 
{n:=n-1} [12] {c := c+1}} n:=0 


m a= =e 
if @>0){ 
M -ev)*((1— NU) - 1) 
{n:=n-1 
Mf N?Q-cVv)(-—NU)"* -1)= gi N 
} [0.5] {c += 1 }5 M a- Nuy +a- CV) 
M CA -CVv)7(( = NU)™ -1)= gi at= ee nies n); a 
M ONU- CV) +00 -CVN - NU) = 1) AA i e iia, 
c += iid(geometric(!/2), n)§ _ ‘ s 
n r= i > _ Pa si he 
Mf (2-C)(2N(1 - CV))~* + C(2(01 - CV))“*)((2 -—C)(2-C — NU) -1) ek EOR Eg 
n:=0} 


Maeve -a-e —U)*-1) 
m a-cCv)"*@-c)@=c=—u)" 


Fig. 2. Program equivalence follows from the equality of the resulting SOP (Lemma 3). 


are equivalent (i.e., |L] = [J]) by showing that [if (a > 0) {P3 I} = H] 
as suggested by Lemma 2. The latter is achieved as in Lemma3: We run both 
programs on the input SOP gno = (1 — NU)~1(1 — CV)71, where U,V are 
meta indeterminates corresponding to N and C, respectively, and check if the 
results are equal. Note that I is the loop-free specification from Example 1; thus 
by transitivity, the loop L is equivalent to the loop in Fig. 1. < 
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6 Case Studies 


We have implemented our techniques in Python as a prototype called ProDIGY®: 
PRObability DIstributions via GeneratingfunctionologY. By interfacing with dif- 
ferent computer algebra systems (CAS), e.g., Sympy [49] and GiNaC [10,57] — as 
backends for symbolic computation of PGF and SOP semantics — PRODIGY 
decides whether a given probabilistic loop agrees with an (invariant) specifica- 
tion encoded as a loop-free ReDiP program. Furthermore, it supports efficient 
queries on various quantities associated with the output distribution. 

In what follows, we demonstrate in particular the applicability of our tech- 
niques to programs featuring stochastic dependency, parametrization, and nested 
loops. The examples are all presented in the same way: the iterative program 
on the left side and its corresponding specification on the right. The presented 
programs are all UAST, given the parameters are instantiated from a suitable 
value domain.“ For each example, we report the time for performing the equiva- 
lence check on a 2,4 GHz Intel i5 Quad-Core processor with 16GB RAM running 
macOS Monterey 12.0.1. Additional examples can be found in [18, Appx. E]. 


while (c > 0) { if (c >0){ 
{n =n +1} [1/2] {m=m+1}3 tmp := binomial(!/2, c)s 
c:=c-— 15 m += tmp n += c — tmp 
tmp := 0 e03 

} tmp := 0} 


Fig. 3. Generating complementary binomial distributions (for n,m) by coin flips. 
binomial(1/2, c) is an alias for iid(bernoulli(1/2), c). 


while (c=1 At<1){ if(c=1At<1){ 
if (t =0) { c:=0 
{c:=0} [a] {t :=1} if (t = 0) { 
} else { t = bernoulli((1—2)b/a4+b—ab)$ 
{c :=0} [b] {t =0} } else { 
} t := bernoulli('/a+b—ab)3 
} }} 


Fig. 4. A program modeling two dueling cowboys with parametric hit probabilities. 


Example 8 (Complementary Binomial Distributions). We show that the pro- 
gram in Fig.3 generates a joint distribution on n,m such that both n and m 
are binomially distributed with support c and are complementary in the sense 
that n+m = c holds certainly (if n = m= 0 initially, otherwise the variables 


3 © https: //github.com/LKlinke/Prodigy. 
t Parameters of Example 4 have to be instantiated with a probability value in (0,1). 
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are incremented by the corresponding amounts). PRODIGY automatically checks 
that the loop agrees with the specification in 18.3 ms. The resulting distribution 
can then be analyzed for any given input PGF g by computing [J](g), where 
I is the loop-free program. For example, for input g = C!°, the distribution 
as computed by PRODIGY has the factorized closed form (44~)'°. The CAS 
backends exploit such factorized forms to perform algebraic manipulations more 
efficiently compared to fully expanded forms. For instance, we can evaluate the 
queries E[m?+2mn+n?] = 235, or Pr(m > TAn < 3) = 7/128, almost instantly. 


Example 4 (Dueling Cowboys [46]). The program in Fig.4 models a duel of 
two cowboys with parametric hit probabilities a and b. Variable t indicates the 
cowboy who is currently taking his shot, and c monitors the state of the duel 
(c = 1: duel is still running, c = 0: duel is over). PRODIGY automatically verifies 
the specification in 11.97 ms. We defer related problems — e.g., synthesizing 
parameter values to meet a parameter-free specification — to future work. 


while (x > 0) { 


y=l if (y=1) { if (x > 0) { 

while (y = 1) { x += geometric(!/2)3 c := iid(catalan(1/2), x)3 
{y=0} [ip] {x:=x+1}}5 y= 03 x= 05 

xi=x-lk } y :=0 

c += 13 } 


Fig. 5. Nested loops with invariants for the inner and outer loop. 


Example 5 (Nested Loops). The inner loop of the program in Fig.5 modifies x 
which influences the termination behavior of the outer loop. Intuitively, the pro- 
gram models a random walk on N: In every step, the value of the current position 
x changes by some random 6 € {—1,0,1,2,...} such that 6 + 1 is geometrically 
distributed. The example demonstrates how our technique enables compositional 
reasoning. We first provide a loop-free specification for the inner loop, prove its 
correctness, and then simply replace the inner loop by its specification, yielding 
a program without nested loops. This feature is a key benefit of reusing the 
loop-free fragment of ReDiP as a specification language. Moreover, existing tech- 
niques that cannot handle nested loops can profit from it; in fact, we can prove 
the overall program to be UAST using the rule of [47]. Interestingly, the outer 
loop has infinite expected runtime (for any input distribution where the proba- 
bility that x > 0 is positive). We can prove this by querying the expected value of 
the program variable c in the resulting output distribution. The automatically 
computed result is oo, which indeed proves that the expected runtime of this 
program is not finite. This example furthermore shows that our technique can 
be generalized beyond rational functions since the PGF of the catalan(p) dis- 
tribution is (1 — \/1 — 4p(1—p)T) / 2p, i.e., algebraic but not rational. We leave 
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a formal generalization of the decidability result from Theorem 4 to algebraic 
functions for future work. PRODIGY verifies this example in 29.17ms. < 


Scalability Issue. It is not difficult to construct programs where PRODIGY poorly 
scales: its performance depends highly on the number of consecutive probabilistic 
branches and the size of the constant n in guards (requiring n-th order PGF 
derivation, cf. Table 2). 


7 Related Work 


This section surveys research efforts that are highly related to our approach in 
terms of semantics, inference, and equivalence checking of probabilistic programs. 


Forward Semantics of Probabilistic Programs. Kozen established in his seminal 
work [43] a generic way of giving forward, denotational semantics to probabilis- 
tic programs as distribution transformers. Klinkenberg et al. [42] instantiated 
Kozen’s semantics as PGF transformers. We refine the PGF semantics substan- 
tially such that it enjoys the following crucial properties: (i) our PGF transform- 
ers (when restricted to loop-free ReDiP programs) preserve closed-form PGF and 
thus are effectively constructable. In contrast, the existing PGF semantics in [42] 
operates on infinite sums in a non-constructive fashion; (ii) our PGF semantics 
naturally extends to SOP, which serves as the key to reason about the exact 
behavior of unbounded loops (under possibly uncountably many inputs) in a 
fully automatic manner. The PGF semantics in [42], however, supports only 
(over-)approximations of looping behaviors and can hardly be automated; and 
(iii) our PGF semantics is capable of interpreting program constructs like i.i.d. 
sampling that is of particular interest in practice. 


Backward Semantics of Probabilistic Programs. Many verification systems for 
probabilistic programs make use of backward, denotational semantics — most 
pertinently, the weakest preexpectation (WP) calculi [38,46] as a quantitative 
extension of Dijkstra’s weakest preconditions [19]. The WP of a probabilistic 
program C w.r.t. a postexpectation g, denoted by wp[C](g)(-), maps every ini- 
tial program state ø to the expected value of g evaluated in final states reached 
after executing C on ø. In contrast to Dijkstra’s predicate transformer semantics 
which admits also strongest postconditions, the counterpart of “strongest post- 
expectations” does unfortunately not exist [36, Chap. 7], thereby not amenable 
to forward reasoning. We remark, in particular, that checking program equiva- 
lence via WP is difficult, if not impossible, since it amounts to reasoning about 
uncountably many postexpectations g. We refer interested readers to [5, Chaps. 
1-4] for more recent advancements in formal semantics of probabilistic programs. 


Probabilistic Inference. There are a handful of probabilistic systems that employ 
an alternative forward semantics based on probability density function (PDF) 
representations of distributions, e.g., (A)PSI [24,25], AQUA [32], Hakaru [14,52], 
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and the density compiler in [11,12]. These systems are dedicated to probabilis- 
tic inference for programs encoding continuous distributions (or joint discrete- 
continuous distributions). Reasoning about the underlying PDF representations, 
however, amounts to resolving complex integral expressions in order to answer 
inference queries, thus confining these techniques either to (semi-)numerical 
methods [11,12,14,32,52] or exact methods yet limited to bounded looping 
behaviors [24,25]. Apart from these inference systems, a recently developed lan- 
guage called Dice [31] featuring exact inference for discrete probabilistic pro- 
grams is also confined to statically bounded loops. The tool Mora [7,8] supports 
exact inference for various types of Bayesian networks, but relies on a restricted 
form of intermediate representation known as prob-solvable loops, whose behav- 
iors can be expressed by a system of C-finite recurrences admitting closed-form 
solutions. 


Equivalence of Probabilistic Programs. Murawski and Ouaknine [51] showed an 
EXPTIME decidability result for checking the equivalence of probabilistic pro- 
grams over finite data types by recasting the problem in terms of probabilistic 
finite automata [23,41,56]. Their techniques have been automated in the equiva- 
lence checker APEX [45]. Barthe et al. [4] proved a 2-EXPTIME decidability result 
for checking equivalence of straight-line probabilistic programs (with determinis- 
tic inputs and no loops nor recursion) interpreted over all possible extensions of 
a finite field. Barthe et al. [3] developed a relational Hoare logic for probabilistic 
programs, which has been extensively used for, amongst others, proving program 
equivalence with applications in provable security and side-channel analysis. 

The decidability result established in this paper is orthogonal to the afore- 
mentioned results: (i) our decidability for checking L ~ S applies to discrete 
probabilistic programs L with unbounded looping behaviors over a possibly infi- 
nite state space; the specification S — though, admitting no loops — encodes a 
possibly infinite-support distribution; yet as a compromise, (ii) our decidability 
result is confined to ReDiP programs that necessarily terminate almost-surely on 
all inputs, and involve only distributions with rational closed-form PGF. 


8 Conclusion and Future Work 


We showed the decidability of — and have presented a fully-automated technique 
to verifying — whether a (possibly unbounded) probabilistic loop is equivalent 
to a loop-free specification program. Future directions include determining the 
complexity of our decision problem; amending the method to continuous distri- 
butions using, e.g., characteristic functions; extending the notion of probabilistic 
equivalence to probabilistic refinements; exploring PGF-based counterexample- 
guided synthesis of quantitative loop invariants (see [18, Appx. F.6] for generat- 
ing counterexamples); and tackling Bayesian inference. 


Acknowledgments. The authors thank Philipp Schréer for providing support for his 
tool PROBABLY (O https://github.com/Philipp15b/Probably) which forms the basis 
of our implementation. 
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Abstract. Markov decision processes are a ubiquitous formalism for 
modelling systems with non-deterministic and probabilistic behavior. 
Verification of these models is subject to the famous state space explosion 
problem. We alleviate this problem by exploiting a hierarchical structure 
with repetitive parts. This structure not only occurs naturally in robotics, 
but also in probabilistic programs describing, e.g., network protocols. 
Such programs often repeatedly call a subroutine with similar behavior. 
In this paper, we focus on a local case, in which the subroutines have 
a limited effect on the overall system state. The key ideas to accelerate 
analysis of such programs are (1) to treat the behavior of the subroutine 
as uncertain and only remove this uncertainty by a detailed analysis if 
needed, and (2) to abstract similar subroutines into a parametric tem- 
plate, and then analyse this template. These two ideas are embedded into 
an abstraction-refinement loop that analyses hierarchical MDPs. A pro- 
totypical implementation shows the efficacy of the approach. 


1 Introduction 


Markov Decision Processes (MDPs) are the model for sequential decision making 
under probabilistic uncertainty, and as such are central in modelling of random- 
ized algorithms, distributed systems with lossy channels, or as the underlying 
formalism in reinforcement learning. A key question in the verification of MDPs 
is: What is the maximal probability that some error state is reached? In this ques- 
tion, one accounts for the probabilistic nature as well as the inherit (potentially 
adversarial) nondeterminism of the system. Various state-of-the-art probabilis- 
tic model checkers, such as Storm [20], Prism [27] and Modest [17] implement 
a variety of methods that automatically compute such maximal probabilities. 
Most widespread are variations of value-iteration that iteratively apply a tran- 
sition function to converge towards the requested probability. 


Hierarchical Structure. Despite various successes, the state space explosion 
remains a significant challenge to the model-based analysis of MDPs. To over- 
come this challenge, some approaches exploit symmetries or the parallel composi- 
tion of a system. Other approaches exploit that typically not all paths through a 
system are equally likely and thus aim to find the essential or critical subsystem. 


© The Author(s) 2022 
S. Shoham and Y. Vizel (Eds.): CAV 2022, LNCS 13371, pp. 102-123, 2022. 
https: //doi.org/10.1007/978-3-031-13185-1_6 
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p = 0.5; time = 0; N=3; passToken(p): 
repeat N times { t= 1; 
time += passToken(p); while (not flip(p)) {t++}; 
if flip(0.5) {p = 0.8p} ttt; 
else {p = 1.25p} while (not flip(p)) {t++}; 
}; return time return t 


(a) Repeated invocation of passToken(p) (b) passToken(p): Pass succeed twice. 


Fig. 1. Simplified example for sending a token over an unreliable channel. 


While we exploit related ideas—a detailed comparison is given in the related 
work, cf. Sect. 7—our approach is fundamentally different and instead exploits a 
hierarchical decomposition natural in many system models. This decomposition 
is captured naturally by probabilistic programs (over discrete bounded variables) 
with non-nested subroutines, where some subroutines are called repeatedly with 
similar arguments. Figure 1 shows an example in which we demonstrate our app- 
roach in Sect. 2. More generally, we are interested in systems with an overall task 
that is achieved by a suitable combination of a limited number of sub-tasks. Such 
a setting occurs naturally, e.g. (i) in robotics, when multiple rooms in a floor 
need to be inspected, or (ii) in routing, when multiple packets need to be routed 
sequentially. The underlying problem structure is also exploited in hierarchical 
planning [5,19,30], where the goal is to find a good but not necessarily optimal 
policy (and induced value). We combine insights from hierarchical planning with 
an abstraction-refinement perspective and then construct an anytime algorithm 
with strict guarantees on the result. 


Local Model-Based Analysis. An adequate operational model for the model-based 
analysis of hierarchical systems is given by a hierarchical MDP, where the state 
space of a hierarchical MDP can be partitioned into subMDPs. Abstractly, one 
can represent a hierarchical MDP by the collection of subMDPs and a macro- 
level MDP [19] where the probabilities of outgoing transitions at a state are 
described by a corresponding subMDP, cf. Sect. 3.2. In this paper, we focus on 
a hierarchical MDPs where the policies that are optimal in (only) a sub MDP 
are optimal (partial) policies in the hierarchical MDP. More intuitively, we can 
solve the subMDPs individually, i.e., the solution (w.r.t. the fixed measure) for 
the subMDP is part of the globally optimal solution. While this assumption is 
restrictive, it is satisfied in various interesting settings. The assumption allows 
us to analyse subMDPs out-of-context, i.e., we can first analyse the sub MDPs 
and then construct the correct macro-MDP, i.e., extract transition probabilities 
and rewards from the subMDP analysis. This approach already improves the 
maximal memory consumption and allows for additional speed-ups if the same 
subMDP occurs multiple times. 


Epistemic Uncertainty During Computation. The key insight to accelerate the 
outlined approach further is to avoid analysing all subMDPs precisely, while still 
providing sound guarantees on the obtained results. Therefore, consider that even 
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before analysing the subMDPs we can analyse an uncertain variant of the macro- 
level MDP where we do not yet know the associated transition probabilities and 
rewards but instead only know intervals. We may then do two things: First, 
we can identify the subMDPs which are most critical, i.e., where replacing the 
interval by a concrete value yields most benefits. Second, and more importantly, 
we can analyse a set of subMDPs and refine the associated uncertainties, i.e., 
tighten the associated intervals. To support the analysis of sets of subMDPs, 
we observe that often, these subMDPs are slight variations. In this paper, we 
represent them as parameterised instances of a particular templates that we 
define using parametric MDPs (pMDPs). The resulting intervals can be used to 
create an (interval-valued version of the) macro-level MDP. Analysing this gives 
bounds on the expected reward in the hierarchical MDP, and the bounds can be 
refined by analysing the subMDPs more precisely. 


Contributions. In a nutshell, we explicitly allow for uncertainty during the solv- 
ing process to speed up the analysis of hierarchical MDPs. Concretely, we con- 
tribute a scalable approach to solve hierarchical MDPs with many different sub- 
MDPs, in particular when these subMDPs are similar, but not the same. The 
approach resembles an abstraction-refinement loop where we abstract the hier- 
archical MDP in two layers and then refine the analysis of the lower layer to 
get a refined representation of the complete MDP. In every step, we can pro- 
vide absolute error bounds. Our approach interprets the different subMDPs as a 
form of uncertainty. The efficient analysis originates from progress made in the 
analysis of uncertain (or parametric) MDPs, and brings that progress to a novel 
setting. The empirical evaluation with a prototype called LEVEL-UP shows the 
efficacy of the approach. 


2 Overview 


We clarify the approach and its applicability with a motivating example 
that drastically abstracts a token passing process where the channel quality 
varies [12]. 


Setting. Consider the protocol in Fig. la which sends a token N times via a 
channel. That channel successfully transmits packets with probability p, where 
p varies over time. The subroutine takes t amount of time, depending on p. 
Specifically, in the model, we alternate between accumulating the required time 
and updating the channel quality for N token transmissions and then return the 
accumulated time. We aim to compute the expected return value. For the sub- 
routine, we assume that sending a token is repeated until an acknowledgement 
is received, which is abstractly modelled in Fig. 1b and corresponds to the small 
Markov chain in Fig. 2a. First, the file must successfully be sent (so — s1), then 
we start sending acknowledgements. The process terminates (s1 — s2) once an 
acknowledgement is received. The complete protocol from Fig. 1 including the 
subroutine is reflected by the large Markov chain in Fig.2b that repeats the 
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small Markov chain (with different probabilities). This model may be analysed 
with standard tools, but for large N (and larger subroutines), the state space 
explosion must be alleviated. 


l-p 1-p 


Mo 


(a) MC for passToken (p) 


Mz p=5/s 


(b) Hierarchical MDP, rewards of 1 at states with loops 


Fig. 2. Ingredients for hierarchical MDPs with the Example from Fig. 1. Annotations 
reflect subMDPs within the macro-MDPs in Fig. 3. 


Macro-MDPs and Enumeration. We thus suggest to abstract the hierarchical 
model into the macro-level MDP in Fig.3a. Here, every state corresponds to 
an invocation of the subprocess. The reward at the states corresponds to the 
expected reward for the complete subprocess. Thus, naively, one may construct 
the macro-MDP, analyse all (reachable) subMDPs independently and annotate 
the macro-MDP states with the appropriate rewards, and finally analyse the 
macro-MDP to obtain a result of ~12.3. This approach avoids representing the 
complete hMDP in the memory, but it is still restricted to analysing systems 
with a limited number of subMDPs. 


Our Approach. We improve scalability by constructing a parameterized macro- 
MDP. Reconsider the rewards for Fig. 3a. The values can be computed via the 
graph in Fig. 3d, where we pick for each value for p (x-axis) and compute the 
corresponding expected reward E (y-axis) obtained by analysing the subMDP in 
Fig. 2a. Intuitively, in our abstraction, we annotate the rewards with lower- and 
upper bounds rather than exact values. Therefore, we compute bounds on the 
rewards by selecting an interval for the values p € [8/25, 25/32], as shown in Fig. 3e. 
Conceptually, this means that we analyse a set of subMDPs at once, namely all 
subMDPs with p € [8/25, 25/32]. Annotating the corresponding expected rewards, 
in this case [64/25, 25/4], then yields the macro-MDP in Fig. 3b. Analysis of this 
MDP yields that overall expected time is in [7.68, 18.75]. We refine these bounds 
by analysing subsets of the subMDPs. We may split the values for p into two 
sets [8/25, 2/5] and [1/2, 25/32]. Then, we obtain two corresponding intervals on the 
expected time in the subMDP as shown in Fig. 3f. Model checking the associated 
macro-MDP, in Fig. 3c, bounds to expected time by [10.12, 14.25]. Technically, 
we realize this reasoning using parameter lifting [33]. 
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Fig. 3. Visualising the computation of expected rewards for the hMDP from Fig. 2b 
using a macro-MDP and interval-based abstractions. 


Supported Extensions. For conciseness, this example is necessarily simple. Our 
approach allows nondeterminism, i.e., action-choices, in the macro-MDP and in 
the subMDPs. The subMDPs may have multiple outgoing transitions, but this 
must be combined with a restricted type of nondeterminism in the subMDP: If 
multiple outgoing transitions are present, the macro-MDP has transition prob- 
abilities that depend on the subMDPs. We present a useful extension for reach- 
ability probabilities, see the discussion at the bottom of Sect. 3.3. 


More Examples. Key ingredient to models where the approach excels are a repet- 
itive task whose characteristics depend on some global state. Two variations are 
the expected energy consumption of a robot with slowly degrading components 
that, e.g., can be improved by maintenance or for job scheduling with periodi- 
cally changing distribution of tasks (e.g., day vs. night). 


3 Formal Problem Statement 


We formalize MDPs and hierarchical MDPs (hMDPs) to pose the problem state- 
ment, then identify a subclass of hMDPs which we call local-policy hMDPs and 
restrict our problem on computing optimal expected rewards in local-policy 
hMDPs. Furthermore, we introduce parametric MDPs as they are key to the 
abstraction-refinement procedure later in the paper. 
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3.1 Background 


Definition 1 (Parametric MDP). A parametric MDP (pMDP) is a tuple 
M = (Sm,4Am,tm,£,Pu,rm,Im) where Sm is a finite set of states, Am is 
a finite set of actions, im E Ss is the initial state, £ = (£0,... £n} is a vector 
of parameters, Pu: Sm x Am X Sm — QZ] are the transition probabilities, 
rm: 5 — Q[a] the state rewards, and Tm is a set of target states. 


We drop the subscripts whenever possible. MDPs are parametric if ¢ 4 () and 
parameter-free otherwise. We omit parameters for parameter-free MDPs. We 
recap some standard notions on pMDPs (and MDPs): 

For a (parameter) valuation u € RË, the instantiation Mļu] globally substi- 
tutes Pu(s,a, s’) with Py; (s, a, s’)(u) and rm(s) with rm(s)(u). An assignment 
u is well-defined, if M(u) constitutes an MDP, i.e., if X}, Pui(s, a, s’)(u) € {0,1} 
and rm(s)(u) > 0 for each s € S, a € A. We denote the set of all well- 
defined assignments with Um. The set Act(s) denotes the enabled actions at 
state s, Act(s) = {a | X), Pu(s,a,s’) # 0 }. If |Act(s)| = 1 for every s € S, 
then the (parametric) MDP is a (parametric) Markov chain (MC). A path m 
is an (in)finite sequence of states so => s,..., with s; € S, a; € Act(s;), 
P(si, @i, Si+1) Æ 0. For finite m, last(z) denotes the last state of m. We use 
[s — OT] to denote the set of (finite) paths T only at the end. The reward r(z) 
along a finite path m is the sum of the state rewards r(m) := > r(s;). 


Specifications. We consider indefinite horizon expected reward, i.e., the expected 
accumulated reward until reaching the target states. We refer to [3,32] for a 
formal treatment and only introduce notation. Therefore, the unique probability 
measure Pr for a set of paths in a parameter-free Markov chain M reaching state 
T can be defined using the usual cylinder set construction. We define Prm(s > 
OT) as the probability to reach a state in T, (ee Pr(a)dz. We then define 


the expected reward until hitting T, ERm(s —> OT) = Trejo] Pr(r)-r(r)dr. 
In both definitions, if s is the initial state, we simply write . . . (QT). For technical 
conciseness, we make the standard assumption that target states are reached 
with probability 1, which ensures that the integral exists and is finite. (Arbitrary) 


reachability probabilities can be nevertheless be modelled using rewards. 


Policies. In pMDPs, we resolve nondeterminism with policies. In this paper, it 
suffices to consider memoryless policies o: S — A. The set of such policies is 
denoted X(M). We omit M if it is clear from the context. It is helpful to also 
consider partial policies ô: S + A. For an pMDP M and a (partial) policy 
G, the induced dynamics are described by the induced pMDP M(6], defined as 
(Sm, Am, tm, £, P, rm, Tm), where the transition probabilities are given as 


Pmu(s,a,s’) if G(s) =a, 
0 otherwise. 


P(s,a, s") = i 


If o is total (not partial), then M is a MC. We define the maximal expected 
reward ER% (OT) = maxses ERmjo}(OT), and say that a policy ø is optimal, 
if ERW (OT) = ER moj (OT). 
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Regions and Parametric Model Checking. A set of valuations described by is 
called a (rectangular) region, if R = {u | u~ < u < ut} for adequate bounds 
u-,ut € RË and using pointwise inequalities, i.e., R is a Cartesian product of 
intervals of parameter values. We denote this region also with [[u~,uT]]. For 
regions, we may compute a lower bound on miner ER Mfu (OT) and an upper 


bound on maxyer ER Miu (OT) via parameter lifting [33,36]. 


3.2 Hierarchical MDPs 


We concentrate on solving hierarchical MDPs (hMDPs). We assume that hMDPs 
are parameter-free and that their topology has some additional known structure. 


Definition 2 (Hierarchical MDPs). A MDP M with a partitioning of its 
states Sm = US; is a hierarchical MDP, if for all i, 


— there exists a unique st € S; such that si = im or predy,(si) Z Sj, and 
- for alls € S; \ {s2}, it holds that s? # tm and predy,(s) C Si. 


The state s, is called the entry state, which we denote entry;. States with 
succm(s) N S; = @ are called ezit-states. The set succ(i) := succm(S;) \ S; 
are the successor states of the partition i. Let Y = max; |succ(i)|. By adding 
auxiliary states, we can assume that |succ(i)| = Y for all i. We call partitions 
with |S;| = 1 trivial. We use I := {i | |S;| > 1} to denote the indices of the non- 
trivial partitions. We remark that every MDP can be considered as an hMDP 
with only trivial partitions. 


Problem: Given a (hierarchical) MDP M with target states T and ņ € 
[0, 1], compute bounds lb, ub with Ib < ERW7*(OT) < ub and ņ -ub < lb. 


The naive solution to this problem is to ignore the hierarchical structure and 
solve the MDP monolithically. In this paper, we contribute methods that actively 
exploit the structure of the hierarchical MDPs with |I| > 1. We will make an 
additional assumption on the structure of the hierarchical MDP. 


3.3 Optimal Local Subpolicies and Beyond 


Intuitively, we want to ensure that the optimal policy within the partitions can 
be computed locally, i.e., on partition without taking into account the complete 
MDP. Therefore, each partition within the MDP can be considered as an indi- 
vidual MDP. In particular, each S; induces a subMDP as follows: 

Definition 3 (subMDP). Given a hierarchical MDP M and partition S;, the 
corresponding subMDP is an MDP M; := (S; := S; U succyy(S;) U {1}, Am U 
{a}, := entry;, Pi, ri, Gi) with P; defined by 


Pu(s,a,s’) ifs E Si anda€ Am, 
P;(s,a,s'):= < 1 else if s Z Sia =a, ands’ =L 


0 otherwise. 
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ri is defined as ri(s) = rm(s) if s € Si, ri(s) =0 otherwise, and Gi := {1;}. 


Thus, for every partition of the hierarchical MDP, the corresponding subMDP 
contains additionally the successor states, and a unique bottom state that is a 
target state and simplifies our construction later. 

Likewise, we can (de)compose memoryless policies for the hierarchical MDP 
as a union of policies on the individual subMDPs. We do this only for nontriv- 
ial partitions. Let o;: S; +» A denote memoryless policies for M; and ø! the 
restriction of a; to S;, then (L];a;): S ~ A is the unique partial policy such 
that 


(Lle) (s) :=o}(s) if s€ S;icI and (|_|o:)(s) := | otherwise. 
I I 


Intuitively, we want that the union of locally optimal policies, a partial policy, 
can be completed to a total policy that is optimal. 


Definition 4 (Optimal local subpolicies). Given a hierarchical MDP M 
with target states T and optimal policies o; E€ X(M;) for alli € I. The hier- 
archical MDP has optimal local subpolicies, if for 6 = | lho: it holds that 
ERR) = ERIE 


That is, if we collect (locally) optimal policies ø; and apply them to M, we 
obtain the MDP M{({_|, :)]. In that MDP, we can pick an optimal policy, and 
together with (|; ;) this constitutes an optimal and total policy for M. 


Assumption: The hierarchical MDP has optimal local subpolicies. 


Roughly, the idea now becomes that rather than solving one large MDP with S 
states, we solve |I| MDPs with $/\I| states and one MDP with I states (assuming 
equally-sized and only nontrivial partitions). 

The assumption is restrictive, but not unreasonable: A subroutine may not 
have any nondeterminism, or a finished task will have no influence on any future 
task. The following proposition, while obvious, formalizes that: 


Proposition 1 (Sufficient criterion). Let M be a hierarchical MDP. The 
MDP has optimal local subpolicies, if for each i € I either 


— there is a single successor for the partition, i.e., |succm( Si) \ $;| = 1, or 
— there are no choices, i.e., |Act(s)| = 1 for all s € Sj, 


Beyond Optimal Local Subpolicies. The efficiency of our approach is partly 
due to the assumption in Definition 4. We observe that adapting this definition 
allows for a spectrum of specific yet useful cases. In particular, say that our 
system describes a protocol in which we must optimize the probability to satisfy 
N tasks all may fail — the subMDPs will have two successor states. Often, it is 
then easy to see (and model) that a locally optimal policy will aim to satisfy 
each task and that thus, the locally optimal policy optimizes the probability to 
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reach the corresponding successor state. Then, by adopting the target states in 
Definition 3 to be the successor state where the task is successful, the notion 
of an optimal policy—and thus of an optimal local subpolicy—changes. These 
changes are minimal and everything that follows below is easily adapted to this 
setting as demonstrated by the prototypical implementation. 


4 Solving hMDPs with Abstraction-Refinement 


In this section, we consider hMDPs with optimal local subpolicies. We step-wise 
develop a sketch of an anytime algorithm that provides lower and upper bounds 
on the expected reward in this hMDP. In Sect. 4.1, we introduce an alternative 
representation of our problem that formalizes the idea of individually comput- 
ing subMDPs. We then formalize the ideas that allow to construct an anytime 
algorithm in Sect. 4.2. In Sect. 4.3, we introduce the abstract requirements for 
analysing sets of subMDPs into the algorithm, and finally, in Sect. 4.4 we intro- 
duce a method that realises this using pMDPs. 


4.1 The Macro-MDP Formulation 


We adapt macro-MDPs [5] which summarize the subMDPs by single states. 


Definition 5 (Macro-MDP). Let M be a hMDP with n non-trivial S; par- 
titions and Sm partitioned as Sm = U Si U S'. The macro-MDP is defined as 
p(M) := (S U {entry; | 1 < i < n}, Am, tm, 0, Pr, Ty) with P and r given by 

Pease: ee PES, = E if s € Si, 


Pry(s, a, 8’) otherwise, rm(s) otherwise. 


where M; is the corresponding subMDP (see Definition 3) and c; is an arbitrary 
but fixed optimal policy, i.e., a policy such that ER m, jo: (0G:) = ERR (OGi)- 


Intuitively, we replace the transitions within S; by a ‘big-step semantics’ that 
aggregates the transitions within S; by single transitions such that the proba- 
bility to reach any successor matches the probability to do so within S; under 
a specific —optimal— policy. Likewise, the expected reward matches the expected 
reward collected in S;!. 


Remark 1. To define a unique macro-MDP, we can take the lexicographically 
smallest policy g; among the optimal policies. Furthermore, we observe that for 
the cases covered by Proposition 1, it is not necessary to compute c; at all: 
Either there is a single successor—implying Pryy,jo,j(O{s’}) = 1 for any o;—or 
|Z(M,;)| = 1. 


The following theorem formalises that, given the assumptions, taking the big- 
step semantics is adequate when optimizing for an expected reward. 


1 Due to the additive nature of expected rewards, we can annotate the state with the 
expected reward even though it may differ over the different paths to an exit of Sj. 
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Theorem 1. Let M be ahMDP with optimal local subpolicies and let u(M) be 
the corresponding macro-MDP. Then: EREM (OT) = ERR (OT). 


The important ingredient are the optimal local subpolicies that ensure that 
we aggregate behavior within the partitions by behavior that agrees with a 
(globally) optimal policy. We give a proof in the appendix”. 


Naive Algorithm. Algorithmically, we first compute ERW (ỌT;) and the asso- 
ciated policy o;, then compute the reachability probabilities on the induced 
Markov chain. We collect these results in a vector res;, which is helpful to con- 
struct the macro-MDP. To clarify further constructions in this paper, we make 
res; explicit. Recall that |succ,,(S;)| = Y for all i. 


Definition 6 (Results for subMDP). Let M; be a subMDP for the parti- 
tion S; of ahMDP M. Let succy4(S;) be ordered. We define res; € RY +? s.t. 


resi(j) := Prus,fo,](O{sucem(Si);}) forO <j <Y and res,(Y) := ERR (OG), 
where o; is an arbitrary but fixed policy such that ER m, jo; (0Gi) = ERK (OG). 


This allows us to reformulate the macro-MDP, in particular, the following two 
identities do hold: 


res;(Y) ifs € S;, 
ra(s) otherwise. 
(1) 


The identities trivialize that constructing the macro-MDP can be done by pre- 
computing the necessary result-vectors. 


res; (7) if s € S; and 
P(s,a,8')= s! = succm (Si); r(s)= l 


Pu(s,a,8') otherwise, 


Enumeration baseline: With macro-MDPs, we reduce the computation 
of ERRI (QT) to (1) analysing all subMDPs M; and (2) analysing (M). 


This rather naive algorithm already limits memory and may exploit similari- 
ties between subMDPs during the analysis, e.g., based on the structure discussed 
in Sect. 4.4. It performs well if the number |I| of subMDPs is sufficiently small. 
We are interested in considering methods that allow for larger I or larger sub- 
MDPs. In particular, we want to avoid analysing all subMDPs, all individually. 


4.2 The Uncertain Macro-MDP Formulation 


Uncertainty Before Computation. We start introducing a method that allows 
providing bounds on the expected rewards after individually analysing a subset 
of the subMDPs. Before computing the individual probabilities in M;, we are 
uncertain about the probabilities and rewards in the MDP u( M). Under this 


? See: https://doi.org/10.48550/arXiv.2206.02653. 
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uncertainty, we may not be able to compute ER/() (OT) precisely. However, we 
may solve the problem statement by bounding the expected reward. Thus, the 
goal is to compute values lb, ub s.t. 


Ib < ERR? (OT) = ERX) (OT) < ub. (2) 


Uncertain Macro-MDPs. We capture the a-priori uncertainty about the sub- 
MDP results in an uncertain macro-MDP, a particularly shaped parametric 
MDP. 


Definition 7 (Uncertain macro-MDP). Let M be a hMDP with n non- 
trivial S; partitions and Sm partitioned as Sm = US; U S'. The uncertain 
macro-MDP is defined as v(M) := (S' U {entry; | 1 < i < n}, Am, tm, 7, 
P,r,Tm) with parameters T := {pij,q | 1<i<n,l <j < Y} where Y = 
|succm(S:)|. P and r given by 


ij if s E€ S; and 
P Pi,j f. qi if s €E Si, 
P(s,a, s’) = s! =succy(Si);, 7 (8) = TE 
r rwise. 
Pm(s,a, s’) otherwise, MAASI ESENG 


Remark 2. Whenever M; and M; are isomorphic, we may reduce the parame- 
ters and replace each occurrence of p; j with p;,; and each occurrence of qy with 
qi. 


The uncertain macro-MDP can be instantiated to coincide with the macro-MDP 
by setting the parameters accordingly. 


Theorem 2. Let M be ahMDP, (M) the associated unique macro-MDP, and 
v(M) the associated uncertain macro-MDP with parameters Pij and qi. Let u* 
be a parameter valuation with u* (pi j) = res;(j) and u* (qi) = res;(Y) for all i,j. 
Then: 


Proof sketch. The construction of the uncertain macro-MDP and the macro- 
MDP only differs in the assignment of probabilities. We set u here as in the 
characterisation in (1) and thus the equality follows. 


Computing Bounds. Assume for now that we can derive some (trivial) sound 
bounds on the results vector for any subMDP M;’. 


Definition 8 (Sound bounds on results). For M;, the vectors Ibres; and 
ubres; are sound bounds if the following pointwise inequality holds 


Ibres; < res; < ubres;. (3) 


3 We discuss our approach in Sect. 4.4, alternatively, one may use bounds from, e.g., 
[4]. 
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These bounds on properties in the subMDP correspond to bounds on the param- 
eters of the uncertain macro-level MDP v(M). Let us formalize this idea. 


Definition 9 (Suitable parameter region). Given u* from Theorem 2. The 
bounds u~,u* are suitable if u7 < u* < ut. For suitable u~,u*, the region 
[[u-, ut]] is called suitable. 


Using this notion, sound bounds Ibres; and ubres; thus yield suitable bounds 
u` (x), u+(x) for all x € U; pig U {qi}. Combined, the sound bounds for every i 
yields a suitable region. Formally: 


analyse uncertain macro-MDP 


--22-- } init. H- - - >} loop > (Ib, ub], o 


analyse individual sub MDP 


Fig. 4. Analysing hMDPs via uncertain macro-MDPs via individual refinement. 
Lemma 1. Given sound bounds Ibres;, ubres; for each i, there exists a trivial 
mapping Reg s.t. Reg(Ibres,,.../bres,,, ubres,,...ubres,) is a suitable region. 
With the suitable region we can apply verification on the parametric MDP. 
Lemma 2. Let R be a suitable region. Then: 

min ER EM) [ul (OT) < ERW (OT) < max EREM) u] (OT). 
Proof sketch. We observe that the inequalities follow from the fact that u* € R 


with u* as in Theorem 2. By that theorem, ERpĝM)[u] (OT) = ERTEM) (OT). The 
statement then follows from Theorem 1. 


From the bounds that we can compute using a suitable region, we then set Ib 
and ub for Eq. (2): 


Ib < min ERM T < ERM T) < ub. 4 
< min ER, (Mu I(T) < < max reM){uJ(OL) <ub. (4) 
Computationally, we may use parameter lifting [33] to find these values. 


Refinement Loop. The complete anytime algorithm is summarized in Fig. 4. We 
start with an hMDP M and extract the uncertain macro-MDP v(M) and the 
subMDPs {M,}*. Furthermore we compute (trivial) sound bounds on Ibres; < 
res; < ubres;. This leads to a suitable region [[u~, w*]] = Reg(Ibres;, ubres;,...). 
Then, we may at any time compute the bounds |b, ub on the expected reward 


4 For efficiency, one must implement extraction without first computing an explicit 
representation of M. 
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in the hMDP M by analysing v(M) on the region [[u~,u*]]. To tighten these 
bounds, we must first refine the suitable region. Therefore, we analyse individual 
subMDPs M; and compute res; and thus u* (x) for x € U;p;,; U qi. This refines 
the suitable bounds such that u~ (x) = u* (x) = ut (x) for x € U;pi,j Uqi. We call 
this refinement individual refinement. The new region is suitable and Theorem 2 
ensures correctness of the refinement. As we only have finitely many subMDPs, 
we obtain lb = ub after finitely many steps. 


Anytime version of the enumeration baseline. Individually refine any 
subset of subMDPs, then analyse the uncertain macro-MDP v(M). 


4.3 Set-Based SubMDP Analysis 


Next, we aim to provide an alternative refinement procedure that analyses a set 
of subMDPs at once, i.e., that refines the suitable bounds for a set of parameters 
at once. We denote the set of goal states for all subMDPs as G®. 


Adequate Abstractions. We aim to compute sound bounds on the results for a 
set of subMDPs such that the bounds are sound for every individual subMDP 
in this set. We generalize Definition 8 as follows: The (lower and upper) bounds 
lbresz, ubresy are sound, if they are sound (lower and upper) bounds for every 
res;, i € Í. 


Lemma 3. Let Ibresr; satisfy the following inequations using0O < j < Y: 


Ibresı(Y) < min ER (QG) and lIbres;(j) < min min Prm,; jo (0G). (5) 


Then, lbresr is a sound lower bound. 


Proof sketch. We must show Ibres; < res; for each i € I. By definition for each 
1<j</Y, lbresz(j) < minje; resy (j) and trivially minyez resy (j) < res;(7). 

We omit the analogous statement for ubres®. In Sect. 4.4, we discuss a partic- 
ular approach to obtain these bounds, i.e., the right hand sides of the equations 
in Eq. 5. Here, we update the algorithm sketch to handle this alternative refine- 
ment. 


Remark 3. We cannot compute the optimal policy g; for the subMDP Mi; in 
this setting. Thus, we must compute probability bounds for all policies, which 
may make these bounds weak. Some optimizations are possible as some actions 
can in fact be excluded. More importantly, however, is that for cases within 
Proposition 1 the policy o; is irrelevant. 


5 Formally, we label the goal states and use G to refer to denote those states. 
6 where min becomes max and inequalities flip. 
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Updated Algorithm. We update the loop from Fig. 4: Rather than refining using a 
single 7, we refine using a set J. Instead of res;, we use Lemma 3 to compute sound 
bounds Ibres;, ubres; and call this set-based refinement. We may set Ibres; = Ibres; 
for each i € J. Then, we can compute a new suitable region via Lemma 1. With 
the suitable region, we can still utilise Eq. (4) to compute an approximation 
(Ib, ub]. However, for completeness we must ensure that if |I| = 1, the upper and 
lower bounds coincide, i.e., Ibresț;} = ubres;;; for every i. That can be ensured 
by using individual subMDP refinement when |I| = 1. 


Idea: We may improve the anytime algorithm by iteratively considering 
sets of subMDPs and extract sound bounds. 


We now first discuss the set-based analysis of multiple subMDPs M;. We clarify 
the realization of the loop box in Sect. 5. 


L M j : l 
hMDP KNR — analyse uncertain macro-MDP 
saM dinit paasa loop | > [lb, ub], o 
I 
analyse set of subMDPs 
l T bounds T 


Fig. 5. Analysing hMDPs with set-based refinement on templated subMDPs. 


4.4 Templates for Set-Based subMDP Analysis 


We present an instance of set-based subMDP analysis where the subMDPs can 
be described as instantiations of a parametric MDPs. 


Parametric Templates. We observe that the subMDPs are often similar, e.g., 
they define sending a file over a channel, exploring a room, in different conditions. 
We capture this similarity as follows: Let {71,.. . Zm} define a set of parametric 
MDPs, where we call each pMDP a template. In particular, for a hierarchical 
MDP M with partitioning S,,...S,, and corresponding subMDPs M,,...,Mn 
asubMDP M; is an instantiation of template 7; and parameter instantiation v’, 
if M; = T; |v]. For a concise description, this paper considers hMDPs over a single 
template 7 and, for any J C I, we denote Vr := {v1,..., Un} the finite (multi)set 
of parameter instantiations for the pMDP T such that T[v;] = M;. 


Abstractions from Templates. In terms of the templates, Lemma 3 requires us to 

bound the expected rewards ER77;)(0G) for all v € Vr. We realize this by defin- 

ing the smallest region toRegion(V;) 2 Vr. For this region, we obtain expected 

rewards by computing the minimum maximal reward in toRegion(V;). That is: 
Ibres7(Y) := min ERT (0G) < min ERM, (QG). 


ve€toRegion(V7) 


T We use v instead of u to avoid confusion with the instantiations for pMDP v(M). 
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We handle the probabilities equally while taking into account the quantification 
over the policies. Following Lemma 3, these bounds are sound. Upper bounds 
are handled analogously. Computationally, we again use parameter lifting [33] 
to find these bounds. We can easily refine: Whenever we split I (or equally, Vr), 
we can compute (potentially) smaller regions toRegion(V,). 

In Fig. 5, we depict our method. In contrast to Fig. 4, we pass the template 
T rather than the individual subMDPs. Furthermore, we now compute initial 
sound bounds via the analysis of the template (i.e., of Vr) and must pass the 
mapping from I to Vz to clarify the shape of the subMDPs. 


Abstraction-Refinement on the subMDPs provides increasingly tight 
suitable regions for the uncertain macro-MDP from the anytime baseline. 


Algorithm 1. Algorithm for Abstraction-Refinement Procedure 


1: Construct macro-MDP v(M), class-MDP 7, and Vj from high-level description. 
2: Q — {(I =I, bounds = (0, 00), weightedvals = I — {1})} 

3: Ib — 0; ub — co; #iter = 0; Res+ 0 

4: while 7- ub > Ib do 

5: R — Q.pop() > Use priority 
6: if R.I = {i} then 

[E Res[i] — check_one(T [v:]) > Computes res; 
8: else 

9: R.bounds — check_set(T, toRegion(Va.r)) > Computes Ibresr.7, ubresr.7 
10: Q — QU split(R) > Split R.I, keep bounds and weights 
11: end if 
12: if #iter mod k = 1 or Q is empty then 
13: R’ — Reg(extract(Q, Res)) > Compute suitable region via Lem 1 
14: Ib, ub — check_set(v(M), R’) 
15: end if 


16: end while 


5 Implementing the Abstraction-Refinement Loop 


Algorithm 1 outlines a basic implementation of the idea sketched in Fig. 5. We 
detail this implementation and then discuss an essential improvement. 

We construct v(M), T, and (the implicit) mapping V: I — Vz to map sub- 
MDPs to instantiations of 7 from a suitable high-level representation. We ini- 
tialize a priority queue with triples that represent sets of template instantiations: 
I such that Vy; := {v; := V (i) | i € I} contains all valuations v such that T[v] is 
a subMDP of M. We initially store bounds reflecting Ibres; and ubres; as well 
as weights for the computation of the priority (see below). Initially, we assume 
that Ib = 0 and ub = ov, we count the number of iterations in #iter. Res is 
map for storing result vectors. The algorithm now refines |b and ub until the gap 
between Ib and ub is sufficiently small. 
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The main loop now iteratively refines lb,ub by first refining lbres; and 
ubres;, by splitting J and model checking 7 w.r.t. subsequently smaller regions 
toRegion(V7;) (l. 5-11): Therefore, we take a set R from the queue. If R.I = {i}isa 
singleton, we compute lbresp 7 = res; = ubrespg.z and store this result. Otherwise, 
we apply model checking to the pMDP T w.r.t. the region representation of R.T. 
We then split R.I, by splitting I into (here) two subsets. For splitting I, we use 
the geometric interpretation of toRegion(V;) as a subset of R!¥!, where we then 
split along one of the axis into two equally large subsets. Every k (we use k = 8) 
iterations, we analyse the macro-MDP (I. 12-15). From Q and Res we extract the 
proper bounds lbres;,ubres; from Res[i] if possible and from Q using R.bounds 
for R such that i € R.I otherwise. Then via Reg(Ibres;, ubres;,...) from Lemma 1 
we compute a suitable region R’. We analyse the uncertain macro-MDP to obtain 
lb and ub in accordance with Eq. (4). 

Finally, we discuss the priority function: If we a-priori naively assume that 
each subMDP contributes an equal amount to the overal minimal expected 
reward in the hMDP (weights are all one) then the following priority function: 
|R.bounds|: J „ez R.weights(v) computes priorities that correlate with how much 
computing res; for all 2 € Z would reduce the gap between Ib and ub. 


Termination and Correctness Argument. Algorithm 1 terminates. We split in 
such way that maxyegq |I| monotonically decreases. Thus, eventually Q is empty 
and Res contains results for all subMDPs. Then, R’ is a point region and checking 
v(M) with this point region ensures that Ib = ub. Correctness follows as R’ is 
always suitable, see Eq. (4). 


Computing Expected Visits. Based on our empirical evaluation we added one 
crucial improvement: While the algorithm above assumed that all subMDPs (or 
states in the macro-MDP) are equally important, that assumption is generally 
inadequate. Roughly, only states reached by the optimal policy contribute at all 
(provided the bounds are tight enough that we can identify these states). The 
reachable states are weighted by the expected number of visits of these states. 
We compute an approximation of this expected number of visit by computing 
the currently optimizing policy (a by-product of 1. 13) and compute the center 
of R’; this results in a MC for which we can compute the number of expected 
visits by a standard equation system [32]. Additionally, we update the weights 
for the regions in the queue based on these new results. We remark that this 
also makes the priority function more useful. 


Interleaving Individual Refinement. Furthermore, for a subMDPs for which the 
expected number of visits is large® are individually analysed (and the points are 
removed from the region in the queue). This optimization reduces the need to 
split the corresponding regions until we obtain tight bounds. 


8 In our implementation, we define this as subMDPs where the expected number of 
visits is in the top 1+ 1/16 - #iter percent, but not more than 150 at a time. 
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6 Experiments 


Implementation. We implemented LEVEL-UP”, a prototype on top of the python 
bindings for Storm [20]. LEVEL-UP analyses hierarchical MDPs by taking two 
MDPs, each provided as probabilistic program descriptions in the PRISM for- 
mat: One MDP that encodes the (uncertain) macro-MDP and one that describes 
the parametric template for the subMDPs. The parameter instance of the sub- 
MDP can be deduced as a function of the high-level variable assignment of 
the macro-MDP states. For technical reasons, the prototype currently provides 
support for subMDPs with one or two successor states — arguably the setting 
in which we expect our prototype to perform best. For subMDPs with a single 
successor state, the uncertain macro-MDP may be represented as an (parameter- 
free) MDP with interval-valued rewards. For two successors, we include support 
of the extension of Sect. 3.3 where the successor aims to optimize reaching a 
fixed successor state. 


Table 1. Benchmark statistics, runtimes of the approaches, and details for Algorithm 1. 
Name Inst |Sm] I |Spcmy| Aew ISl JAT] tinit |tenum | tso too təs || iter. indrf. % % % 
corr 1050-107 624 255576 541704 15000 65006 <1| 16) 3 9 13| 17 14 267 2 
corr 118100 108 624 254376 539040 60000 260006 <1|| 100/10 45 45 9 16 280 4 
corr 118,200 108 624 254376 539040 240000 1040006 2|| 689/51 313 568| 17 30 092 4 
corr is11s0 107 768 1024344 2172432 15000 65006 3| 21) 8 18 25| 17 17 5 36 1 
corrl 108 1056 34200 83160 33750 146256 <1] 90 4 21 38 17 43 0 84 8 
corrl 108 1128 39576 96768 33750 146256 <1] 98 4 38 38 17 45 0 84 8 
corrl 108 1632 89136 224160 33750 146256 <1 168| 5 44 67) 25 102 1 80 14 
mail io 10° 173857 793971 1088152 2801 3601 4 | 552| 8 21 48| 57 658 29 2 4 
mail 12 10° 236802 1446551 2023504 2801 3601 8 | 738/16 43 130| 97 703 42 2 
netw soso 10 9801 437823 437823 4026 4026 1| 23| 8 33 46) 217 150 60 1 
netw soso 10 9801 437823 437823 10041 10041 1| 62| 8 34 48| 217 150 59 3 3 
netw soso 108 9801 1025883 1025883 10041 10041 2| 62|16 94 112 225 150 62 1 
sdn  saz44 108 23375 128386 128386 13506 16855 <1|| 62| 2 20 112) 289 305 2 17 11 
sdn ssaa 10 23375 128386 128386 2802 3455 <1| 98) 1 5 15| 281 305 13 17 8 
sdn ssaa 10° 126337 408227 408227 2802 3455 2|] 519| 5 46 394| 3057 305 27 7 0 


Setup. We investigate the scalability and the quality of the approximation over 
time. Therefore, we run our prototype on an MacBook 2020 M1 with an 8 GB 
RAM limit. We compare the enumerative baseline from Sect. 4.1 with Algo- 
rithm 1. Both exploit the hierarchical nature of the MDP. We qualitatively com- 
pare to standard model checking on the flat MDP, see below. We use a collection 
of benchmarks reflecting networks, job schedulers and robots. 


Results. We consider instances that we summarize in Table 1. In particular, we 
give the benchmark name and instance for reference, the approximate number 
of states in the hierarchical MDP (computed from the macro-MDP and the 


? The source code and executables, the benchmarks, logfiles and utilities are all avail- 
able in an archived Docker container: https://doi.org/10.5281/zenodo.6524787. 
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subMDPs), the number of nontrivial partitions, and the number of states and 
actions in the (uncertain) macro-MDP and subMDPs, respectively. Then, we 
give the time to setup the data structures from the high-level representation tinit 
in seconds. We highlight that a flat representation of all our benchmarks has at 
least 10’, often more, states. As a reference, we present the performance of the 
enumerative baseline from Sect. 4.1. The performance of this approach is positive 
as it enables the verification of huge MDPs. A TO indicates >1200s. To scale 
to either larger subMDPs or more subMDPs, we use the abstraction-refinement 
loop. To reflect the anytime nature, we list three run times for terminating when 
n-ub < lb with 7 € {0.5,0.9,0.95} respectively. The largest time faster than 
the enumerative baseline is highlighted (further to the right is better for the 
abstraction-refinement). For 7 = 0.95, we give details: The number of itera- 
tions (iter), the number of individual refinements based on the improvement 
from Sect. 5, and the fraction of time spent on model checking the uncertain 
macro-MDPs %um, the set-refinements %,,, and the individual refinements %;,, 
respectively. 


Discussion. Before we discuss details of the results, let us clarify that exploiting 
the hierarchical structure is essential. MDPs with ~10° states are at the limit 
of what fits in around 8GB of memory!®. Symbolic methods based on MTBDDs 
easily scale beyond these sizes, but—noting that the subMDPs are all slightly 
different—the models we consider lack the necessary symmetry that make MTB- 
DDs compact. Thus, support for hierarchical MDPs is a necessary step forward. 

Regarding the abstraction-refinement: While a larger study may be necessary, 
we can start with two standard observations: The abstraction-refinement loop 
is significantly faster on 7 < 0.9. As 7 — 1, coarse abstractions are insufficient. 
Furthermore, the efficiency of the abstraction-refinement heavily depends on the 
particular structure. That being said, the approach outperforms the enumerative 
approach, especially for 7 = 0.9, and up to more than an order of magnitude. This 
happens even if I is rather small, or if, e.g., J is small. We furthermore observe 
that for large I, the bookkeeping in python becomes a bottleneck. We think these 
observations are promising: we left many options for further optimizations and 
tweaking towards particular examples on the table. However, for models where 
most time is spent on model checking the macro-level MDP, the approach is less 
suitable. We furthermore conjecture that tailored algorithms may exploit some 
of these dimensions, e.g., when there is the macro-MDP or the subMDPs are 
indeed MCs or perhaps acyclic, depending on the number of parameters and 
their influence [36], or based on the relative weight of the uncertain rewards 
compared to rewards in the macro-MDP. 


7 Related Work 


In the model-free reinforcement learning (RL) setting, hierarchical models are 
popular. An excellent, recent survey is given in [29]. Our work generalizes the 


1 Assuming 128 byte per state, i.e., 8 doubles and 16 (32-bit) ints, as used in Storm. 
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solution techniques on hierarchical MDPs that assume that these subMDPs are 
the same. In RL, this assumption is treated liberally, and the methods provide 
only weak error bounds. In contrast, our model-based approach provides error- 
bounds in every step, and the error disappears in finitely many steps. 

Hierarchical abstractions are used to analyse large MDPs in [5]. There, the 
goal is to find a policy that almost optimizes the reward. Rather than preim- 
posing a hierarchy, the algorithm aims to find a hierarchy and define the goal 
states of the subMDP such that the model admits local policies. Instead, our 
solution can find the optimal policy and in particular gives strict error bounds at 
the cost of requiring a high-level model that induces the hierarchy. An symbolic 
approach for continuous MDP, where the transition probabilities are the result 
of an associated LP, has recently been discussed in [24]. An hierarchical SCC- 
decomposition [1] aims to accelerate the process of solving a (given, monolithic) 
Markov chain. The computation of reward-bounded properties [18] generalizes 
topological value iteration and their notion of episodes mildly resembles an hier- 
archical approach but no uncertainty is assumed or used in the approach. The 
probabilistic model checker PAT [35] analyses a hierarchical probabilistic timed 
automaton given as a process algebra. The hierarchy is not exploited in the 
solving process. 

While symbolic approaches, often on decision diagrams, exploit the transition 
system by compressing the data structures, abstractions aim to yield smaller 
systems that may assess an approximation for the sought-for values. Abstraction- 
refinement without an imposed hierarchy is explored in [16,21,25]: Refinement 
amounts to considering a better approximation of the state space. In contrast, 
we impose the hierarchy, the abstraction amounts to an imprecise analysis of 
this fixed state space and we refine by analysing the state space more precisely 
(by means of analysing subMDPs at a greater level of detail). Contract-based 
abstractions (in probabilistic systems) are used to decompose the analysis of 
systems given by parallel running subsystems [14, 28,38]. Partial exploration and 
bounded model checking approaches focus on the most critical paths, i.e., the 
paths where most of the probability mass lies [7,23,26], but these approaches do 
generally not exploit the hierarchical and repetitive structure. The observation 
that many parts of the system are not critical allows us to weigh the potential 
benefit of refining the intervals in various parts of the macro-MDP. 

Parametric MDPs are commonly used to model and analyse the effects 
of uncertainty in the precise transitions [15,23,31]. The methods presented 
in [13,22] exploit a repetitive structure in parametric MCs to accelerate the 
construction of closed form solutions and are not applicable to MDPs. Para- 
metric models have been used to support the design of systems [2,8] or their 
adaption [6,9], to find policies for partially observable systems [11], to analyse 
Bayesian networks [34], and to speed up the analysis of, e.g., software product 
lines [10,37]. On top of technical differences, none of these approaches uses a 
hierarchical decomposition of an MDP or uses the results of the analysis in the 
analysis of a larger MDP. 
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8 Conclusion 


This paper presents a first verification approach that exploits a specific hierar- 
chical structure natural in many models to accelerate analysing the underlying 
MDP. An essential ingredient is to separate the two levels in the hierarchy. Then, 
when analysing the (toplevel) macro-MDP, we may consider subMDPs that have 
not yet been analysed as epistemic uncertainty. Analysis techniques for uncer- 
tain (more precise: parametric) MDPs then enable an online approximation loop 
that incrementally removes uncertainty in a targeted fashion by analysing more 
and more subMDPs (more) precisely. Three clear directions for future work are 
to (i) consider an approach where one lifts the restrictions to locally-optimal 
policies, (ii) investigate the applicability to a richer set of temporal properties 
and (iii) to allow automatic detection of partitions in, e.g., the Prism language. 
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Abstract. Existing neural network verifiers compute a proof that each 
input is handled correctly under a given perturbation by propagating 
a symbolic abstraction of reachable values at each layer. This process 
is repeated from scratch independently for each input (e.g., image) and 
perturbation (e.g., rotation), leading to an expensive overall proof effort 
when handling an entire dataset. In this work, we introduce a new 
method for reducing this verification cost without losing precision based 
on a key insight that abstractions obtained at intermediate layers for 
different inputs and perturbations can overlap or contain each other. 
Leveraging our insight, we introduce the general concept of shared certifi- 
cates, enabling proof effort reuse across multiple inputs to reduce overall 
verification costs. We perform an extensive experimental evaluation to 
demonstrate the effectiveness of shared certificates in reducing the ver- 
ification cost on a range of datasets and attack specifications on image 
classifiers including the popular patch and geometric perturbations. We 
release our implementation at https: //github.com/eth-sri/proof-sharing. 


Keywords: Neural Network Verification - Local Verification - 
Adversarial Robustness 


1 Introduction 


The success of neural networks across a wide range of application domains [21,30] 
has led to their widespread application and study. Despite this success, neural 
networks remain vulnerable to adversarial attacks [8,23] which raises concerns 
over their trustworthiness in safety-critical settings such as autonomous driving 
and medical devices. To overcome this barrier, formal verification of neural net- 
works has been proposed as a key technology in the literature [39]. As a result, 
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recent years have witnessed a growing interest in verifying critical safety proper- 
ties of neural networks (e.g., fairness, robustness) [14,17,18,31,32,40,42] speci- 
fied using pre and post conditions over network inputs and outputs respectively. 
Conceptually, existing verifiers propagate sets of inputs in the precondition cap- 
tured in symbolic form (e.g., convex sets) through the network, an expensive 
process that produces over-approximations of all possible values at intermediate 
layers. The final abstraction of the output can then be used to check postcondi- 
tions. The key technical challenge all existing verifiers aim to address is speeding 
up and scaling the certification process, i.e., faster and more efficient propagation 
of symbolic shapes while reducing the overapproximation error. 


This Work: Accelerating Certification via Proof Sharing. In this work, we pro- 
pose a new, complementary method for accelerating neural network verification 
based on the key observation that instead of treating each certification attempt 
in isolation as existing verifiers do, we can reuse proof effort among multiple such 
attempts, thus obtaining significant overall speed-ups without losing precision. 
Figure 1 illustrates both, standard verification and the concept of proof sharing. 

In standard verification an input region Z1 (æ) (orange square) is propagated 
from left to right, obtaining intermediate shapes at each intermediate layer (here 
the goal is to verify all points in the input region are classified as “cat” by 
the neural network N). We observe that the abstraction obtained for a new 
region Z2(a) (e.g., blue shapes) can be contained inside existing abstractions 
from Zı(æ), an effect we term proof subsumption. This effect can be observed 
both between abstractions obtained from different specifications (e.g., 04. and 
adversarial patches) for the same data point and between proofs for the same 
property but different, yet semantically similar inputs. Building on this observa- 
tion, we introduce the notion of proof sharing via templates. Proof sharing works 
in two steps: first, we leverage abstractions from existing proofs in order to create 
templates, and second, we augment the verifier with these templates, stopping 
the expensive propagation at an intermediate layer as soon as the newly gen- 
erated abstraction is included inside an existing template. Key technical ingre- 
dients to the effectiveness of our approach are fast template generation and 
inclusion checking techniques. We experimentally demonstrate that proof shar- 
ing can achieve significant speed-ups in challenging scenarios including proving 
robustness to adversarial patches [10] and geometric perturbations [3] across 
different neural network architectures. 


Main Contributions. Our key contributions are: 


— An introduction and formalization of the concept of proof sharing in neural 
network verification: the idea that some proofs capture others (Sect. 3). 

— A general framework leveraging the above concept, enabling proof effort reuse 
via proof templates (Sect. 4). 

— A thorough experimental evaluation involving verification of neural network 
robustness against challenging adversarial patch and geometric perturbations, 
demonstrating that our methods can achieve proof match rates of up 95% as 
well as provide non-trivial end-to-end certification speed-ups (Sect. 5). 
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Fig. 1. Visualization of neural network verification. The input regions Z; (æ), Z2(a) are 
propagated layer by layer through a neural network N. The high-dimensional convex 
shapes are visualized in 2d. While initially Zı (æ) and Z2(æ) only slightly overlap, at 
layer k, Ny.~(Z2(x)) is fully contained in Nj,,(Zi(a)). (Color figure online) 


2 Background 
Here we formally introduce the necessary background for proof sharing. 


Neural Network. A neural network N is a function N : Rä» — R=, commonly 
built from individual layers N = Nz o Nz_,0---o Ny. Throughout this text, we 
consider feed-forward neural networks, where each layer N;(x) = max(Aa-+b, 0) 
consists of an affine transformation (Aa + b) as well as a rectified linear unit 
(ReLU), that applies the max with 0 elementwise. A neural network, classifying 
inputs into c classes, outputs dout := c scores, one for each class, and assigns the 
class with the highest score as the predicted one. While, as is common in the 
neural network verification literature, we use image classification as a proxy task, 
many other applications work analogously. Our approach also naturally extends 
to other types of neural networks, if verifiers exist for these architectures. We 
discuss the challenges and limitations of such generalizations in Sect. 4.5. In the 
following, for k < L, we let N1: denote the application of the first k layers and 
Nx+1:, denote the last L — k layers respectively. 


(Local) Neural Network Verification. Given a set of inputs and a postcondition 
w, the goal of neural network verification is to prove that w holds over the output 
of the neural network corresponding to the given set of inputs. In this work, we 
focus on local verification, proving that w holds for the network output for a 
given region Z(a) C R» formed around the input x. Formally, we state this as: 


Problem 1 (Local neural network verification). For a region Z(a#) C R%*, neural 
network N, and postcondition 7, verify that Vz € I(x). N(z) H w. We write 
T(x) Ew if Yz € T(x). N(z) Ev. 


Here, we restrict ourselves to verifiers based on abstract interpretation [11,14] 
as they achieve state-of-the-art precision and scalability [31,32]. Further, many 
other popular verifiers [38,42] can be formulated using abstract interpretation. 
These verifiers propagate Z(a) symbolically through the network N layer-by-layer 
using abstract transformers, which overapproximate the effect of applying the 
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transformations defined in the different layers on symbolic shapes. The propaga- 
tion yields an abstraction of the exact shape at each layer. The verifiers finally 
check if the abstracted output implies %. This is showcased in Fig. 1, where the 
input regions Zı (æ) and Z2(a) are propagated layer-by-layer through N. 

For a verifier V, we let V(Z(a),.N) denote the abstraction obtained after 
the propagation of Z(x) through the network N. We declutter notation by over- 
loading N and writing N(Z(a)) for the same if V is clear from context, i.e., 
V(Z(a), N) = N(Z(ax)). 

We consider robustness verification, where the goal is to prove that the net- 
work classification does not change within an input region. A common input 
region is the ¢,.-bounded additive noise, defined as Z.(x) := {z | ||a—Zlloo < €}. 
Here, e defines the size of the maximal perturbation to a. The postcondition p 
denotes classification to the same class as x. Throughout this paper, we consider 
different instantiations for T(x) but assume that 7 denotes classification invari- 
ance (although other choices would work analogously). Due to this, we refer to 
T(x) as input region and specification interchangeably. For example, in Fig. 1, 
the goal is to verify that all points contained in N(Z\(a)) are classified as “cat”. 


3 Proof Sharing with Templates 


Before introducing our framework for proof sharing, we further expand the moti- 
vation example discussed in Fig. 1. 


3.1 Motivation: Proof Subsumption 


As stated earlier, we empirically observed that for many input regions T;(a) and 
T;(a), the abstraction corresponding to one region at some intermediate layer k 
contains that of another. Formally: 


Definition 1 (Proof Subsumption). For specifications T;(x),Z;(x), we say 
that the proof of T;(a) subsumes that of Tj(x) if at some layer k, Ni.4(Z;(a@)) C 
Ni:n(Zi(x)), which we denote as Tj(x) Cn, Zi (a). 


While not formally required, particularly 
interesting are cases where proof subsump- 
tion occurs despite Z;(a) Z Z;(a). This form 
of proof subsumption is showcased in Fig. 1, 
where Z1 (x) and Z2(a) have only a small over- 
lap, yet Z2(a) Cn, Zi(x). For another exam- 
ple, consider a neural network N trained as 
a hand-written digit classifier for the MNIST Fig.2. Example of an MNIST 
dataset [22] (example shown in Fig. 2) and the image. 7}? (æ) signifies arbi- 
following two specifications: trary change in the outlined area. 


— €,.-bounded perturbations: all the pixels in an input image can arbitrarily be 
changed independently by a small amount Ze(æ) := {z | |æ — zZlloo < €}, 
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Table 1. Proof subsumption on a robust 
MNIST classifier with 94 % accuracy. Verif. acc. 
denotes the percentage of verifiable inputs from 
the test set for €..-perturbations (Te). 


verif. acc. Ty 4(@) CyN,k Te(w) at layer k [%] 
$ for Ze [%] 
t 5 3 7 5 Fig. 3. The abstraction obtained 
0.1 89.74 61.40 72.85 77.65 81.75 82.70 for Ze(x) (blue) contains that for 
0.2 81.40 62.85 77.05 82.40 86.05 86.60 Ty (x) (orange) (projected to d = 


2). (Color figure online) 


— adversarial patches [10]. A p x p patch inside which the pixel intensity can 
vary arbitrarily is placed on an image at coordinates (i, j), for which we write 
TXp- We showcase a patch in Fig. 2 and formally define them in Sect. 4.3. 


Clearly ty (æ) Z T(x) (unless € = 1). In Tablel, we show that for a 
classifier (5 layers with 100 neurons each) we indeed observe proof subsumption. 
We report the accuracy, i.e., the rate of correct predictions on the unperturbed 
test data, as well as the certified accuracy, i.e., the rate of samples æ for which 
the prediction is correct and Z(a) — w is verified, for Ze with e = 0.1 and 0.2 
over the whole test set. We also show the percentage of Z32.(a) contained in 
T.(a) at layer k. To this end, we pick 1000 random g for which Ze(æ) is verifiable 
and sample 2 (i, j) pairs each. We utilize a Box domain verifier and a robustly 
trained network [24]. Figure 3 shows a patch specification Z37.,(a) (in orange) 
contained in the £% specification Ze (in blue) projected to 2 dimensions via PCA. 


Reasons for Proof Subsumption. In Table1, we observe that the rate of proof 
subsumption increases with larger € and k. These observations give an intuition 
as to why we observe proof subsumption. First, as input regions pass through the 
neural network, in each layer the abstractions become more imprecise. While this 
fundamentally limits verification, it makes the subsumption of abstractions more 
probable. This effect increases, when increasing e for Ze. Second, and more funda- 
mentally, while passing through the layers of a neural network, we observed that 
semantically similar yet distinct image inputs, e.g., two similar-looking hand- 
written digits, have activation vectors that grow closer in 42 norm as they pass 
through the layers of the neural network [21,34]. This effect is a consequence of 
the neural network distilling low-level information (e.g., individual pixel values) 
into high-level concepts (e.g., the classes of digits). As specifications (and their 
proofs) correspond to sets of concrete inputs, a similar effect may apply. We 
conjecture that these two effects drive the observed proof subsumption. 


3.2 Proof Sharing with Templates 


Leveraging this insight, we introduce the idea of proof sharing via templates, 
showcased in Fig. 4. We use an abstraction obtained from a robustness proof 
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(a) We generate a (verifiable) template T from (b) We shortcut the verifica- 
the abstraction obtained by propagating the or- tion if intermediate abstrac- 
ange input. tion are contained in the T. 


Fig. 4. Conceptualization of proof sharing with templates. In (a) we create a verifiable 
template T (black-dashed border) from specification Nj1:,(Zi(a)). When verifying new 
specifications T2,...,Z5, shown in (b), we can shortcut the verification of all but Z5 by 
subsuming them in T. 


Ni.4,(Zi(x)) at layer k to create a template T. After ensuring that T is ver- 
ifiable, it can be used to shortcut the verification of other regions, e.g., of 
T2(x),...,Z5(a). Formally we decompose proof sharing into two sub-problems: 
(i) the generation of proof templates and (ii) the matching of abstractions cor- 
responding to other properties to these templates. For simplicity, here we only 
consider templates at a single layer k of the neural network and we show an 
extension to multiple layers in Sect. 4.3. 

Our goal is to construct a template T at layer k that implies the postcondition 
and captures abstractions at layer k obtained from propagating several T;(a). 
As it is challenging to find a single T that captures abstractions corresponding 
to many input regions, yet remains verifiable, we allow a set of templates 7. We 
state this formally as: 


Problem 2 (Template Generation). For a given neural network N, input æ and 


set of specifications Z;,...,Z,., layer k and a postcondition w, find a set of tem- 
plates 7 with |7| < m such that: 
arg max X` LV Nin (Zi(x)) CT (1) 
T 41 Lrer 


st. VT ET. Ngaa: (T) = y. 


Intuitively, Eq. (1) aims to find a set 7 of templates T at layer k, such that 
the maximal amount (via the sum) of specifications Z1,...,Z; is contained in 
at least one template T (via the disjunction) while ensuring that the individual 
T are still verifiable (via the constraint on the second line). As neural network 
verification required by the constraints of Eq. (1), is NP-complete [17], comput- 
ing an exact solution to Problem 2 is computationally infeasible. Therefore, we 
compute an approximate solution to Eq. (1). In general, Problem 2 does not nec- 
essarily require that the templates T are created from previous proofs. However, 
building on proof subsumption, as discussed in Sect.3.1, in Sect.4 we will infer 
the templates from previously obtained abstractions. 
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To leverage proof sharing once the templates 7 are obtained, we need to be 
able to match an abstraction S = Nj.,(Z(a)) verified using proof transfer to a 
template in T: 


Problem 3 (Template Matching). Given a set of templates T at layer k of a 
neural network N, and a new input region Z(a), determine whether there exists 
aT € T such that S C T, where S = N,.,(Z(ax)). 


Together, Problems 2 and 3 outline a general framework for proof sharing, 
permitting many instantiations. We note that Problems 2 and 3 present an inher- 
ent precision vs. speed trade-off: Problem 3 can be solved most efficiently for 
small values of m = |T| and simpler representations of T (allowing faster check- 
ing of S C T) at the cost of lower proof matching rates. Alternatively, Eq. (1) 
can be maximized by large m and T represented by complex abstractions, thus 
attaining high precision but expensive template generation and matching. 


Beyond Proof Sharing on the Same Input. In this section, we focused on proof 
sharing for different specifications of the same input x. However, we observed 
that proof sharing is even possible between specifications defined on different 
inputs a and a’. To facilitate the use of templates in this setting, Eq.(1) in 
Problem 2 can be adapted to consider an input distribution. 


4 Efficient Verification via Proof Sharing 


We now consider an instantiation of proof sharing where we are given an input 
x and properties Z,,...,Z;, to verify. Our general approach, based on Problems 2 
and 3, is shown in Algorithm 1. In this section, we first discuss Algorithm 1 in 
general. We then describe the possible choices of abstract domains and their 
implications on the algorithm, followed by a discussion on template generation 
for two different specific problems. Finally, we conclude the section with a dis- 
cussion on the conditions for effective proof sharing verification. 

In Algorithm 1, we first create the set of templates 7 (Line 1, discussed 
shortly) and subsequently verify Z),...,Z, using 7. Here, we consider two, 
potentially identical, verifiers Vr and Vs, where Vr is used to create the tem- 
plates 7 and Vg is used to propagate input regions up to the template layer k. 
For each Z; we propagate it up to layer k (Line 4) to obtain S = Nj.,(Z;(a)) and 
check if we can match it to a template T; € T (Line 6) using an inclusion check. 
If a match is found, then we conclude that N(Z;(a)) | w and set the verifica- 
tion output v; to True. If this is not the case (Line 11) we verify N(Zi(x)) Ew 
directly by checking Vs(S, Nk+1:L) = WY. If the template generation fails, we 
revert to verifying Z; by applying Vs in the usual way (omitted in Algorithm 1). 


Soundness. As long as the templates T are sound, this procedure is sound, i.e. 
Algorithm 1 only returns v; = True if Vz € Z;(a). N(z) — w holds. Formally: 


Theorem 1. Algorithm 1 is sound ifV T €T, z € T. Nk41:L(2) =| Y and Vg is 
sound. 
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This holds by the con- -Algorithm 1: Neural Network Verifica- 
struction of the algorithm: tion Utilizing Proof Templates 


Proof. For a given «x and Input: z,7,,...,Z,,k,w, verifiers Vs, Vr 
Ti, Algorithm1 only claims Result: v1,..., Ur indicating 

vi = True if either the vi = (N(Z;(a)) H 4%) 

check in (i) Line 6 or (ii) 1 Z — GEN -TEMPLATES(g, N, k, Y, Vs, Vr) 
Line 11 succeeds. Since Vg is 2 v1,...,Up — False 


sound, we know that Vz € 3 for i — 1 to r do 
T(x). Ni:k(z) € S. There 4 |S Vo(Z;(x), Nik) 
fore in case (i) by our require- 5 | for Tj € T do 
ment on T as well as S C 6 if S C Tj then 
T it follows that Vz € 7 vi — True 
Ti(x). N(z) = Y. In case (ii) 8 break 

we execute Line 12 and the ə end 

same property holds due to 49 | end 

the soundness of Vs. 11 | if ~v; then 


Importantly, Theorem1 1! vi — (Vs(S, Natit) E Y) 
shows that the generation 13 | end 
process of 7 does not affect 14 end 
the overall soundness as long 15 return v1,...,Ur 
as the set of templates 7 ful- 
fills the condition in Theorem 1. In particular, that means that when solving 
Problem 2, it suffices to show the side condition (Y T € T. Nk: (T) = Y4) 
holds, while heuristically approximating the actual optimization criteria. We let 
Vr denote the verifier used to ensure this property in GEN-TEMPLATES. 


Precision. We say a verifier V; is more precise than another verifier V2 on N if 
out of a set of specifications it can verify some that V2 can not. 

Theorem 2. If Vs(Vs(Zi(x), Nix), Ne+i:t) = Vs(Zi(a), N), then Algorithm 1 
is at least as precise as Vg. 

Proof. Since, even if the inclusion check in Line 6 fails, due to Line 12 we out- 
put vi = Vg(Vs(Zi(a), Nik), Nk+1:L) = Y (Line 12), which by our requirement 
equals v; = Vs(Z;(x),.N) H} Y. Therefore we have at least the precision of Vg. 


The required property holds for any verifier Vs for which the abstractions 
of all network layers depends only on the abstractions from previous layers and 
is fulfilled for all verifiers considered in this paper. For verifiers Vg that do not 
fulfill the required property, potential losses in precision can be remedied (at 
the cost of runtime) by using Vs(Z;(x), Ni.) in Line 12. Interestingly, it is even 
possible to increase the precision of Algorithm 1 over Vs by creating templates 
T that are verified with a more precise verifier Vr. However, in this discussion, 
we restrict ourselves to speed gains. We believe that obtaining precision gains 
requires instantiating our framework with a significantly different approach than 
that taken for improving speed which is the main focus of our work. We leave 
this as an interesting item for future work. 
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Run-Time. Here, we aim to characterize the run-time of Algorithm 1 as well as 
its speed-up over conventional verification. For an input x, (keeping the other 
parameters fixed), the expected run time is 


tps = tr +r(ts + tc + (1 -— p)ty) (2) 


where ty is the expected time required to generate the templates at Line 1, r is 
the number of specifications to be verified, tg is the expected time to compute 
S (Line 4), tc is the time to check S C T for T € T until a match is found (Line 
5 to Line 10), p € [0,1] is the rate of specifications where a template is found and 
ty is the time required to check w on the network output corresponding to S (Line 
12). This time is minimized if the individual expected run times t7,ts,ty are 
minimal and p is large (i.e., close to 1). Unfortunately, computing the template 
match rate p analytically is challenging and requires global reasoning over the 
neural network for all valid inputs, which are not clearly defined. However, our 
empirical analysis (in Sect. 5) shows that p is higher when templates are created 
at later layers (as in Sect. 3.1). 

To determine the speed-up compared to a baseline standard verifier, we make 
the simplifying assumption that there is a single verifier V = Vs = Vr that has 
expected run-time v for each layer. Thus, the expected run-time for the con- 
ventional verifier is tg, = rLv. We have tr = AmLv, ts = kv, ty = (L — k)v, 
tc = nm and ultimately tps = (m+ r(1 — p))Lv + rpkv + rnm for constants 
A € Ryo, which indicates the overhead in generating one template over just 
verifying it, and 7 E€ Ryo which denotes the time required to perform an inclu- 
sion check for one template. As this phrasing shows, Algorithm 1 has the same 
asymptotic runtime as the base verifier V. Further, this formulation allows us to 
write our expected speed-up as ibe = aR ner TEF This speed-up 
is maximized when k is small compared to L, i.e., templates are placed early in 
the neural network, the matching rate p is close to 1, and m, À, ņ are small, i.e., 
generation and matching are fast. Unfortunately, these requirements are at odds 
with each other: as we show in Sect. 5, higher m leads to higher matching rate 
p and p is naturally higher for templates later in the neural network (higher k). 
Thus high speed-ups require careful hyper-parameter choices. 

To showcase how we can achieve good templates as well as fast matching, 
we next discuss the choice of the abstract domain to be used in the propagation 
and the representation of the templates. Then we discuss the template genera- 
tion procedure and instantiate it for the verification of robustness to adversarial 
patches and geometric perturbations. 


4.1 Choice of Abstract Domain 


To solve Problems 2 and 3 in a way that minimizes the expected runtime and 
maximizes the overall precision, the choice of abstract domain is crucial. Here we 
briefly review common choices of abstract domains for neural network verification 
and how they are suited to our problem. Geometrically these domains can be 
thought of as a convex abstraction of the set of vectors representing reachable 
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values at each layer of the neural network. We say that an abstraction a; is more 
precise than another abstraction a2, if and only if a, C ag, i.e., all points in a; 
occur in ag. Similarly, we say that a domain is more precise than another if it 
can express all abstractions in the other domain. 

The Box (or Interval) domain [14,16,24] abstracts sets in d dimensions as 
B = {a+ diag(d)e | e € [—1, 1]*} with center a € R? and width d € RZ). The 
Zonotope domain [14,15,24,31,40] uses relaxations Z of the form g 


Z = {a + Ae | e € [—1,1]*%}, (3) 


parametrized with a € R? and A € R®*4. 


A third common choice are (restricted) con- Table 2. Feasibility of S C T 
vex Polyhedra P [12,32,42]. Here, we consider P for Box B, Zonotope Z (with 
to be in the DeepPoly (DP) domain [32,42]. Gen- order reduction) and DP Poly- 
erally, Boxes are less precise, i.e. certify fewer hedra P. 
properties, than Zonotopes or Polyhedra. T 


For efficient proof sharing, we require a fast 
inclusion check S C T, which is challenging B Z|a(Z)| P 
in our context due to the high dimensional- Bs X / (V) 
ity d of the intermediate neural network lay- 
ers. While we point the interested reader to [29] S Zs X| v x 
for a detailed discussion, we summarize the key Ply X J (s) 
results in Table 2. There, “ denotes feasibility, 


i.e. low polynomial runtime (usually 2d compar- 

isons, sometimes with an additional matrix multiplication), X denotes infeasibil- 
ity, e.g. exponential run time. If T is a Box all checks are simple as it suffices 
to compute the outer bounding box of S and compare the 2d constraints. If 
T is a DP Polyhedra these checks require a linear program (LP) to be solved. 
While the size of this LP permits a low theoretical time complexity, in case S$ 
is a Box or DP Polyhedra, in practice, we consider calling an LP solver too 
expensive (denoted as (W)). For Zonotopes these checks are generally infeasible, 
as they require enumeration of the faces or corners, which is computationally 
expensive for large d and P. While Zonotopes can be encoded as Polyhedra 
(but not necessarily DP Polyhedra) and the same LP inclusion check as for P 
could be used, the resulting LP would require exponentially many variables due 
to the previously mentioned enumeration. However, by placing constraints on 
the matrix A in Eq. (3) these inclusion checks can be performed efficiently. The 
mapping of a Zonotope to such a restricted Zonotope is called order reduction 
via outer-approximation [19,29]. 

In particular, for a Zonotope Z we consider the order reduction apox to its 
outer bounding box (where A is diagonal) and note that other choices of a are 
possible (e.g. the reduction to affine transformations of a hyperbox). 

For a general Zonotope Z its outer bounding box Z’ = apox(Z) can be 
easily obtained. The center of Z’ is a, the center of Z. The width d € Rp is 


given as d; = ae |A; jl- Z’ is represented as either a Box or a Zonotope (with 
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A = diag(d)). To check S C Z’ for a general Zontope S it suffices to check 
QBox(S') C Z’ which reduces to the simple inclusion check for boxes. 

Based on the above discussion we will use the Zonotope domain to represent 
all abstractions, and use verifiers Vs = Vr that propagate these zonotopes using 
the state-of-the-art DeepZ transformers [31]. To permit efficient inclusion checks 
we apply apox on the resulting zonotopes to obtain the Box templates T, which 
we treat as a special case of Zonotopes. 


4.2 Template Generation 


We now discuss instantiations for GEN_TEMPLATES in Algorithm 1. Recall from 
Sect. 3.1 the idea of proof subsumption, i.e. that abstractions for some specifi- 
cation contain abstractions for other specifications. Building on this, we relax 
the Problem 2 in order to create m templates T from intermediate abstractions 
Ni: 4 (Li(a)) for some Jis i T Note that Î; are not necessarily directly related 
to the specifications Z1, ...,Z, that we want to verify. For a chosen layer k, input 
x, number of templates m and verifiers Vs and Vr we optimize 


aeiy Vya x), Nig) C T} 
Li, Lm i=1 (4) 


where T; = ants (a), Ni:k)) 
s.t. Vr( Tj, Nk+1:L) = w for j El,...,m 


As originally in Problem2 (Eq. (1)) we aim to find a set of templates such 
that the intermediate shapes at layer k for most of the r specifications are covered 
by at least one template T. In contrast to Eq. (1), we tie Tj to the specifications 
Ê. This alone does not make the problem easier to tackle. However, next, we 
will discuss how to generate application-specific parametric Î; and solve Eq. (4) 
by optimizing over their parameters, allowing us to solve template generation 
much more efficiently than in Eq. (1). 


4.3 Robustness to Adversarial Patches 


We now instantiate the above scheme in order to verify the robustness of image 
classifiers against adversarial patches [10]. Consider an attacker that is allowed 
to arbitrarily change any p x p patch of the image, as showcased earlier in Fig. 2. 
For such a patch over pixel positions (|i, i+p—1]x [j, j+p—1]), the corresponding 
perturbation is 

T p(2) = {2 € [0,1]'*" | 2,0 = £rg,} 


pX p 


vith my = (0D I EE) 


where h and w denote the height and width of the a x. Here 7;,; denotes 
the parts of the image affected by the patch, and RE. j its complement, i.e., the 
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pa = pte ae 
Hı Hı = s 
[3 la Nin (Zila, )) 
He ' 
BrT 
(a) Zæ (b) Center + Border (c) 2x2 Grid Th = Opox(Ny:4(Z;(x, €:))) 
Fig. 5. Example splits u for 10 x 10 pixels. Fig. 6. Example Template. (Color 


figure online) 


unaffected part of the image. To prove robustness for an arbitrarily placed p x p 
patch, however, one must consider the perturbation set Zp. p(x) := Ui jp% p(®). 

To prove robustness for Tpxp, existing approaches [10] separately verify 
Dee) for all i € {1,...,h — p+ 1},j € {1,...,w — p + 1}. For example, 
with p = 2 and a 28 x 28 MNIST image, this approach requires 729 individual 
proofs. Because the different proofs for Zpxp share similarities, this is an ideal 
candidate for proof sharing. We utilize Algorithm 1 and check ^;v; at the end to 
speed up this process. For template generation, we solve Eq. (4) for m templates 
with an input perturbation ve per template. 

We empirically found that (recall Table 1) setting Z; to an Z% region he, 
to work particularly well to capture a majority of patch perturbations Ty. 
at intermediate layers. Specifically, we found that setting €; to the maximally 
verifiable value for this input to work particularly well. 

To further increase the number of specifications contained in a set of tem- 


plates 7, we use m template perturbations of the form 
T;(a) = {z | IEJ E Zm llo <é ^ T, = Zag h 


where pu; denotes a subset of pixels of the input image and uÏ its complement and 
we maximize ¢€; in a best-effort manner. In particular, we consider 11,..., Hm, 
such that they partition the set of pixels in the image (e.g., in Fig. 5). 

As noted earlier, this generation procedure needs to be fast, yet obtain T to 
which many abstractions match in order to obtain speed-ups. Thus, we consider 
small m, and fixed patterns f1,...,&@m. For each Ī;, we aim to find the largest 
ci which can still be verified in order to maximize the number of matches. Note 
that for m = 1, this is equivalent to the Zœ input perturbation Ze with the 
maximally verifiable e for the given image. 

Concretely, we can perform binary search over e; in order find a large «;, 
still satisfying Ngk+1:L(@Box(Nı:x(Ĉi))) E w. Verification with our chosen DeepZ 
Zonotopes is not monotonous in €; due to the non-monotonic transformers used 
for non-linearities (e.g., ReLU). This renders the application of binary search a 
best-effort approximation. As we don’t require a formal maximum but rather 
aim to solve a surrogate for Problem 2, this still works well in practice. Further 
note that, applying apo, to templates introduces imprecision, i.e. Vp might not 
be able to prove properties over templates that it could without the application 
of agox. However, Theorem 2 (which only requires properties of Vg) still applies. 
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Templates at Multiple Layers. We 
can extend this approach to obtain Generation tor Patches 
templates at multiple layers with- Input: x, N, m,..., um, K, Y, Vr 
out a large increase in computa- Result: T* for k € K 
tional cost. With templates at mul- 


Algorithm 2: Online Template 


; 1T*—{lforkeK 

tiple layers, we first try to match > fori 1 to m do 

the propagated shape against the 4 Îi(æ €) := {2 | (£n, — zuill <€ 

earliest template layer and upon } f Aa P iiss E 

failure propagate it further to the x Bi 

next, where we again attempt  * f(e) = Vri (æ, €), N) Feb 

to match the template. In Algo- ê | & bin-search(e, f (e)) 

rithm1, this means repeating the 7 for k € K do z 

block from Line 4 to Line 10 for 8 Tk — Qgox(Vr(Li(æ, €i), Nik) 

each template layer before going on 9 9(Bx) = Vr(PTr, Negiit) E Y 

to the check on Line 11. 1o Br — bin_search((, g(8)) 
The full template generation 11 T*® —T*U{ eT} 

procedure is given in Algorithm2. 12 | end 


First, we perform a binary search 13 end 

over €; (Line 6) to find the largest 14 return T* fork © K 

€i, for which the specification is ver- 

ifiable. Then for each layer k in the set of layers K at which we are creating 
templates we create a box Tk from the Zonotope. As this Tę may not be ver- 
ifiable, due to the imprecision added in a@gox, we then perform another binary 
search for the largest scaling factor Bp (Line 10), which is applied to the matrix 
A in Eq. (3). We denote this operation as 3,7). We show an example for a single 
layer k in Fig.6. The blue area outlines the Zonotope found via Line 6, which 
is verifiable as it is fully on one side of the decision boundary (red, dashed). 
After applying apox (orange), however, is not (crosses the decision boundary). 
By scaling it with 8p the shape is verifiable again (green) and used as a template. 


4.4 Geometric Robustness 


Geometric robustness verification [3,13,28,32] aims to verify the robustness of 
neural networks against geometric transformations such as image rotations or 
translations. These transformations typically include an interpolation operation. 
For example consider rotation Ry of an image by y € J’ degrees for an interval I" 
(e.g., y € [—5,5]), for which we consider the specification Zp(a#) := {R,(x) | y € 
I}. We note that, unlike Zœ and patch verification, the input regions for geo- 
metric transformations are non-linear and have no closed-form solutions. Thus, 
an overapproximation of the input region must be obtained [3]. For large I’, the 
approximate input region Zr(æ), can be too coarse resulting in imprecise veri- 
fication. Hence, in order to assert Y on Zp, existing state-of-the-art approaches 
[3], split T into r smaller ranges I,,..., I. and then verify the resulting r spec- 
ifications (Zr,,w) for i € 1,...,r. These smaller perturbations share similarities 
facilitating proof sharing. We instantiate our approach similar to Sect.4.3. A 


key difference to Sect. 4.3 is that while æ € Tii (æ) for all i, j in patches, here 
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in general x ¢ Zp,(x) for most i. Therefore, the individual perturbations TZ; (a) 
do not overlap. To account for this, we consider m templates and split I’ into m 
equally sized chunks (unrelated to the r splits) obtaining the angles 71,...,%m 
at the center of each chunk. For m templates we then consider the perturbations 


T; := T.,(Ry,(x)), denoting the Zæ perturbation of size e; around the y; degree 
rotated x. To find the template we employ a procedure analogous to Algorithm 2. 


4.5 Requirements for Proof Sharing 


Now, we discuss the requirements on the neural network N such that proof 
sharing via templates works well. For simplicity, we discuss simple per-dimension 
box bounds propagation for Vs and Vr. However, similar arguments can be made 
for more complex relational abstractions such as Zonotopes or Polyhedra. 

In order for an abstraction S to match to a template T, we need to show 
interval inclusion for each dimension. For a particular dimension 7 this can occur 
in two ways: (i) when both S and T are just a point in that dimension and 
these points coincide, e.g., a? = af, or (ii) when a? + d? C at + d!. While 
particularly in ReLU networks, the first case can occur after a ReLU layer sets 
values to zero, we focus our analysis here on the second case as it is more com- 
mon. In this case, the width of T in that dimension d? must be sufficient to 
cover S. Ignoring case (i) and letting supp(T) denote the dimensions in which 
d? > 0, we can pose that supp(.9) C supp(T) as a necessary condition for inclu- 
sion. While it is in general hard to argue about the magnitudes of these values, 
this approach still provides an intuition. When starting from input specifications 
supp(Z) Z supp(Z), supp(S) C supp(T) can only occur if during propagation 
through the neural network Nj., the mass in supp(Z) can “spread out” suffi- 
ciently to cover supp($). In the fully connected neural networks that we discuss 
here, the matrices of linear layers provide this possibility. However, in networks 
that only read part of the input at a time such as recurrent neural networks, 
or convolutional neural networks in which only locally neighboring inputs feed 
into the respective output in the next layer, these connections do not necessarily 
exist. This makes proof sharing hard until layers later in the neural network, 
that regionally or globally pool information. As this increases the depth of the 
layer k at which proof transfer can be applied, this also decreases the potential 
speed-up of proof transfer. This could be alleviated by different ways of creating 
templates, which we plan to investigate in the future. 


5 Experimental Evaluation 


We now experimentally evaluate the effectiveness of our algorithms from Sect. 4. 


5.1 Experimental Setup 


We consider the verification of robustness to adversarial patch attacks and geo- 
metric transformations in Sect. 5.2 and Sect.5.3, respectively. We define spec- 
ifications on the first 100 test set images each from the MNIST [22] and the 
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Table 3. Rate of Tii a matched to templates T for T2x2 patch verification for different 
combinations of template layers k, 7 x 200 networks,using m = 1 template. 


template at layer k 1 2 3 4 5 6 7 patch verif. [%] 
MNIST 18.6 85.6 94.1 95.2 95.5 95.7 95.7 97.0 
CIFAR 0.1 27.1 33.7 34.4 34.2 34.2 34.3 42.2 


Table 4. Average verification time in seconds per image for Z2x2 patches for different 
combinations of template layers k, 7 x 200 networks,using m = 1 template. 


Proof Sharing, template layer k 
Baseline 1 2 3 4 1+3 2+3 2+4 2+3+4 
MNIST 2.10 1.94 1.15 1.22 1.41 1.27 1.09 1.10 1.14 
CIFAR 3.27 2.98 2.53 2.32 2.47 2.35 2.49 2.42 2.55 


CIFAR-10 dataset [20] (“CIFAR”) as with repetitions and parameter variations 
the overall runtime becomes high. We use DeepZ [31] as the baseline verifier as 
well as for Vg and Vr [31]. Throughout this section, we evaluate proof sharing for 
two networks on two common datasets: We use a seven layer neural network with 
200 neurons per layer (“7 x 200”) and a nine layer network with 500 neurons per 
layer (“9 x 500”) for both the MNIST[22] and CIFAR datasets [20], both uti- 
lizing ReLU activations. These architectures are similar to the fully-connected 
ones used in the ERAN and Mnistfe VNN-Comp categories [2]. 

For MNIST, we train 100 epochs, enumerating all patch locations for each 
sample, and for CIFAR we train for 600 with 10 random patch locations, as out- 
lined in [10] with interval training [16,24]. On MNIST the 7 x 200 and the 9 x 500 
achieve a natural accuracy of 98.3% and 95.3% respectively. For CIFAR, these 
values are 48.8% and 48.1% respectively. Our implementation utilizes PyTorch 
[25] and is evaluated on Ubuntu 18.04 with an Intel Core i9-9900K CPU and 64 
GB RAM. For all timing results, we provide the mean over three runs. 


5.2 Robustness Against Adversarial Patches 


For MNIST, containing 28 x 28 images, as outlined in Sect.4.3, in order to 
verify inputs to be robust against 2 x 2 patch perturbations, 729 individual 
perturbations must be verified. Only if all are verified, the overall property can 
be verified for a given image. Similarly, for CIFAR, containing 32 x 32 color 
images, there are 961 individual perturbations (the patch is applied over all 
color channels). 

We now investigate the two main parameters of Algorithm 2: the masks 
[1,-++;/lm and the layers k € K. We first study the impact of the layer k 
used for creating the template. To this end, we consider the 7 x 200 networks, 
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Table 5. Z2x2 patch verification with templates at the 2nd & 3rd layer of the 7 x 200 
networks for different masks. 


Method/Mask m patch matched [%] t [s] 


Baseline - - 2.14 
L-infinity 1 94.1 1.11 
Center + Border 2 94.6 1.41 
2 x 2 Grid 4 95.0 3.49 


Table 6. Z2x2 patch verification with templates generated on the second and third 
layer using the @..-mask. Verification times are given for the baseline t?” and for 
applying proof sharing t?® in seconds per image. 


Dataset Net verif. acc. [%] tP% t9 patch mat. [%] patch verif. [%] 


MNIST 7 x 200 81.0 2.10 1.10 94.1 97.0 
9 x 500 66.0 2.70 1.32 93.0 95.3 
CIFAR 7x 200 29.0 3.28 2.45 33.7 42.2 
9 x 500 28.0 5.48 4.48 34.2 46.2 


use m = 1 (covering the whole image; equivalent to a, Table 3 shows the cor- 
responding template matching rates, and the overall percentage of individual 
patches that can be verified “patches verif.”. (The overall percentage of images 
for which T2x2 is true is reported as “verif.” in Table6.) Table 4 shows the cor- 
responding verification times (including the template generation). We observe 
that many template matches can already be made at the second or third layer. 
As creating templates simultaneously at the second and third layer works well 
for both datasets, we utilize templates at these layers in further experiments. 

Next, we investigate the impact of the pixel masks p,..., {4m. To this end, 
we consider three different settings, as showcased in Fig.5 earlier: (i) the full 
image (f..-mask as before; m = 1), (ii) “center + border” (m = 2), where we 
consider the 6 x 6 center pixel as one group and all others as another, and (iii) 
the 2 x 2 grid (m = 4) where we split the image into equally sized quarters. 

As we can see in Table 5, for higher m more patches can be matched to the 
templates, indicating that our optimization procedure is a good approximation 
to Problem 2, which only considers the number of templates matched. Yet, for 
m > 1 the increase in matching rate p does not offset the additional time in 
template generation and matching. Thus, m = 1 results in a better trade-off. 
This result highlights the trade-offs discussed throughout Sect.3 and Sect. 4. 
Based on this investigation we now, in Table 6, evaluate all networks and datasets 
using m = 1 and template generation at layers 2 and 3. In all cases, we obtain a 
speed up between 1.2 to 2x over the baseline verifier. Going from 2 x 2 to 3 x 3 
patches speed ups remain around 1.6 and 1.3 for the two datasets respectively. 
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Table 7. Speed-ups achievable in the setting of Table 3. t?” the baseline. 


speedup at layer k 


Layer k 1 2 3 4 
realized Pear? 1.08 1.83 1.72 1.49 
optimal tP¥/(tr +rts +rtc) 3.75 2.51 1.92 1.56 
optimal, no C tP*/(tr +rts) 4.02 2.68 2.01 1.62 
optimal, no gen T., no C tP¥ /rts 4.57 2.90 2.13 1.69 


Comparison with Theoretically Achievable Speed-Up. Finally, we want to deter- 
mine the maximal possible speed-up with proof sharing and see how much of this 
potential is realized by our method. To this end we investigate the same setting 
and network as in Table 3. We let t?4 and t?*% denote the runtime of the base 
verifier without and with proof sharing respectively. Similar to the discussion 
in Sect.4 we can break down tÙ into tr (template generation time), ts (time 
to propagate one input to layer k), tc (time to perform template matching) 
and ty (time to verify S if no match). Table 7 shows different ratios of these 
quantities. For all, we assume a perfect matching rate at layer k and calculate 
the achievable speed-up for patch verification on MNIST. Comparing the opti- 
mal and realized results, we see that at layers 3 and 4 our template generation 
algorithm, despite only approximately solving Problem 2 achieves near-optimal 
speed-up. By removing the time for template matching and template generation 
we can see that, at deeper layers, speeding up tc and ty only yield diminishing 
returns. 


5.3 Robustness Against Geometric Perturbations 


For the verification of geometric perturbations, we take 100 images from the 
MNIST dataset and the 7 x 200 neural network from Sect. 5.2. In Table 8, we 
consider an input region with +2° rotation, +10% contrast and +1% brightness 
change, inspired by [3]. To verify this region, similar to existing approaches [3], 
we choose to split the rotation into r regions, each yielding a Box specification 
over the input. Here we use m = 1, a single template, with the largest verifiable 
c€ found via binary search. We observe that as we increase r, the verification 
rate increases, but also the speed ups. Proof sharing enables significant speed-up 
between 1.6 to 2.9x. 

Finally, we investigate the impact of the number of templates m. To this end, 
we consider a setting with a large parameter space: +40° rotation generated input 
region with r = 200. In Table 9, we evaluate this for m templates obtained from 
the œ input perturbation around m equally spaced rotations, where we apply 
binary search to find €; tailored for each template. Again we observe that m > 1 
allows more templates matches. However, in this setting the relative increase is 
much larger than for patches, thus making m = 3 faster than m = 1. 
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Table 8. +2° rotation, +10% contrast and +1% brightness change split into r per- 
turbations on 100 MNIST images. Verification rate, rate of splits matched and verified 
along with the run time of Zonotope t?” and proof sharing t’*. 


r verif. [%] splits verif. [%] splits matched [%] t?% tT 


4 73.0 87.3 73.1 3.06 1.87 
6 91.0 94.8 91.0 9.29 3.81 
8 93.0 95.9 94.2 20.64 7.48 
10 95.0 96.5 94.9 38.50 13.38 


Table 9. +40° rotation split into 200 perturbations evaluated on MNIST. The verifi- 
cation rate is just 15 %, but 82.1 % of individual splits can be verified. 


Method m splits matched [%] t [s] 
Baseline - - 11.79 
Proof Sharing 1 38.0 9.15 
2 41.1 9.21 
3 58.5 8.34 


5.4 Discussion 


We have shown that proof sharing can achieve speed-ups over conventional exe- 
cution. While the speed-up analysis (see Sect. 4 and Table7) put a ceiling on 
what is achievable in particular settings, we are optimistic that proof sharing 
can be an important tool for neural network robustness analysis. In particular, 
as the size of certifiable neural networks continues to grow, the potential for 
gains via proof sharing is equally growing. Further, the idea of proof effort reuse 
can enable efficient verification of larger disjunctive specifications such as the 
patch or geometric examples considered here. Besides the immediately useful 
speed-ups, the concept of proof sharing is interesting in its own right and can 
provide insights into the learning mechanisms of neural networks. 


6 Related Work 
Here, we briefly discuss conceptually related work: 


Incremental Model Checking The field of model checking aims to show whether a 
formalized model, e.g. of software or hardware, adheres to a specification. As neu- 
ral network verification can also be cast as model checking, we review incremental 
model checking techniques which utilize a similar idea to proof sharing: reuse 
partial previous computations when checking new models or specifications. Proof 
sharing has been applied for discovering and reusing lemmas when proving the- 
orems for satisfiability [6], Linear Temporal Logic [7], and modal ju-calculus [33]. 
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Similarly, caching solvers [35] for Satisfiability Modulo Theories cache obtained 
results or even the full models used to obtain the solution, with assignments for 
all variables, allowing for faster verification of subsequent queries. For program 
analysis tasks that deal with repeated similar inputs (e.g. individual commits 
in a software project) can leverage partial results [41], constraints [36] precision 
information [4,5] from previous runs. 


Proof Sharing Between Networks. In neural network verification, some 
approaches abstract the network to achieve speed-ups in verification. These sim- 
plifications are constructed in a way that the proof can be adapted for the original 
neural network [1,43]. Similarly, another family of approaches analyzes the dif- 
ference between two closely related neural networks by utilizing their structural 
similarity [26,27]. Such approaches can be used to reuse analysis results between 
neural network modifications, e.g. fine-tuning [9,37]. 

In contrast to these works, we do not modify the neural network, but achieve 
speed-ups rather by only considering the relaxations obtained in the proofs. [37] 
additionally consider small changes to the input, however, these are much smaller 
than the difference in specification we consider here. 


7 Conclusion 


We introduced the novel concept of proof sharing in the context of neural network 
verification. We showed how to instantiate this concept, achieving speed-ups of 
up to 2 to 3 x for patch verification and geometric verification. We believe that 
the ideas introduced in this work can serve as a solid foundation for exploring 
methods that effectively share proofs in neural network verification. 
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Abstract. Linear approximations of nonlinear functions have a wide 
range of applications such as rigorous global optimization and, recently, 
verification problems involving neural networks. In the latter case, a lin- 
ear approximation must be hand-crafted for the neural network’s activa- 
tion functions. This hand-crafting is tedious, potentially error-prone, and 
requires an expert to prove the soundness of the linear approximation. 
Such a limitation is at odds with the rapidly advancing deep learning 
field — current verification tools either lack the necessary linear approxi- 
mation, or perform poorly on neural networks with state-of-the-art acti- 
vation functions. In this work, we consider the problem of automatically 
synthesizing sound linear approximations for a given neural network acti- 
vation function. Our approach is example-guided: we develop a procedure 
to generate examples, and then we leverage machine learning techniques 
to learn a (static) function that outputs linear approximations. How- 
ever, since the machine learning techniques we employ do not come with 
formal guarantees, the resulting synthesized function may produce linear 
approximations with violations. To remedy this, we bound the maximum 
violation using rigorous global optimization techniques, and then adjust 
the synthesized linear approximation accordingly to ensure soundness. 
We evaluate our approach on several neural network verification tasks. 
Our evaluation shows that the automatically synthesized linear approx- 
imations greatly improve the accuracy (i.e., in terms of the number of 
verification problems solved) compared to hand-crafted linear approxi- 
mations in state-of-the-art neural network verification tools. An artifact 
with our code and experimental scripts is available at: https://zenodo. 
org/record/6525186#. Yp51L9LMIzM. 


1 Introduction 


Neural networks have become a popular model choice in machine learning due 
to their performance across a wide variety of tasks ranging from image classifi- 
cation, natural language processing, and control. However, they are also known 
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to misclassify inputs in the presence of both small amounts of input noise and 
seemingly insignificant perturbations to the inputs [37]. Indeed, many works 
have shown they are vulnerable to a variety of seemingly benign input trans- 
formations [1,9,17], which raises concerns about their deployment in safety- 
critical systems. As a result, a large number of works have proposed verification 
techniques to prove that a neural network is not vulnerable to these perturba- 
tions [35,43,44], or in general satisfies some specification [15, 18,27, 28]. 

Crucial to the precision and scalability of these verification techniques are 
linear approximations of the network’s activation functions. 

In essence, given some arbitrary activation function o(), a linear approxi- 
mation is a coefficient generator function G(l,u) — (ai, bi, du, bu), where l, u E€ R 
are real values that correspond to the interval [l, u], and aj, b1, @u, bu € R are real- 
valued coefficients in the linear lower and upper bounds such that the following 
condition holds: 


Va € [l,u]. a- a+b; < olx) < au: £+ bu (1) 


Indeed, a key contribution in many seminal works on neural network verification 
was a hand-crafted G(l, u) [2,7,19,33-35,42—45, 47] and follow-up work built off 
these hand-crafted approximations [36,38]. Furthermore, linear approximations 
have applications beyond neural network verification, such as rigorous global 
optimization and verification [21,40]. 

However, crafting G(l,u) is tedious, error-prone, and requires an expert. 
Unfortunately, in the case of neural network activation functions, experts have 
only crafted approximations for the most common functions, namely ReLU, 
sigmoid, tanh, max-pooling, and those in vanilla LSTMs. As a result, existing 
techniques cannot handle new and cutting-edge activation functions, such as 
Swish [31], GELU [14], Mish [24], and LiSHT [32]. 

In this work, we consider the problem of automatically synthesizing the coef- 
ficient generator function G(l, u), which can alternatively be viewed as four indi- 
vidual functions Ga, (1, u), Go, (1, u), Ga, (l, u), and Gy, (1,u), one for each coeffi- 
cient. However, synthesizing the generator functions is a challenging task because 
(1) the search space for each function is very large (in fact, technically infinite), 
(2) the optimal generator functions are highly nonlinear for all activation func- 
tions considered both in our work and prior work, and (3) to prove soundness of 
the synthesized generator functions, we must show: 


Vil, u] € IR, x € [l, u] . 
(Gar (J, u) T+ Gy, (I, u)) < a(x) x (Ga, (J, u) a Go, (l, u)) 


where IR = {[/,u] | l,u € R,l < u} is the set of all real intervals. The above 
equation has highly non-linear constraints, which cannot be directly handled by 
standard verification tools, such as the Z3 [6] SMT solver. 

To solve the problem, we propose a novel example-guided synthesis and veri- 
fication approach, which is applicable to any differentiable, Lipschitz-continuous 
activation function o(x). (We note that activation functions are typically 
required to be differentiable and Lipschitz-continuous in order to be trained 


(2) 
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Fig. 1. Overview of our method for synthesizing the coefficient generator function. 


by gradient descent, thus our approach applies to any practical activation func- 
tion). To tackle the potentially infinite search space of G(l, u), we first propose 
two templates for G(1, u), which are inspired by the hand-crafted coefficient func- 
tions of prior work. The “holes” in each template are filled by a machine learning 
model, in our case a small neural network or linear regression model. Then, the 
first step is to partition the input space of G(l, u), and then assign a single tem- 
plate to each subset in the partition. The second step is to fill in the holes of each 
template. Our approach leverages an example-generation procedure to produce 
a large number of training examples of the form ((l, u), (a7, bi, du, bu )), which can 
then be used to train the machine learning component in the template. However, 
a template instantiated with a trained model may still violate Eq. 2, specifically 
the lower bound (resp. upper bound) may be above (resp. below) the activation 
function over some interval |l, u]. To ensure soundness, the final step is to bound 
the maximum violation of a particular template instance using a rigorous global 
optimization technique based on interval analysis, which is implemented by the 
tool IbexOpt [5]. We then use the computed maximum violation to adjust the 
template to ensure Eq. 2 always holds. 

The overall flow of our method is shown in Fig. 1. It takes as input the acti- 
vation function o(x), and the set of input intervals J, C IR for which G(l, u) 
will be valid. During design time, we follow the previously described approach, 
which outputs a set of sound, instantiated templates which make up G(l, u). 
Then the synthesized G(1, u) is integrated into an existing verification tool such 
as AUTOLIRPA [46] or DEEPPOLY [35]. These tools take as input a neural 
network and a specification, and output the verification result (proved, coun- 
terexample, or unknown). At application time (i.e., when attempting to verify 
the input specification), when these tools need a linear approximation for o(z) 
over the interval |l, u], we lookup the appropriate template instance, and use it 
to compute the linear approximation (az, bi, Qu, bu), and return it to the tool. 

To the best of our knowledge, our method is the first to synthesize a lin- 
ear approximation generator function G(l,u) for any given activation function 
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o(x). Our approach is fundamentally different from the ones used by state- 
of-the-art neural network verification tools such as AUTOLIRPA and DEEP- 
POLy, which require an expert to hand-craft the approximations. We note that, 
while AUTOLIRPA can handle activations that it does not explicitly support 
by decomposing a(x) into elementary operations for which it has (hand-crafted) 
linear approximations, and then combining them, the resulting bounds are often 
not tight. In contrast, our method synthesizes linear approximations for (x) 
as a whole, and we show experimentally that our synthesized approximations 
significantly outperform AUTOLIRPA. 

We have implemented our approach and evaluated it on popular neural 
network verification problems (specifically, robustness verification problems in 
the presence of input perturbations). Compared against state-of-the-art lin- 
ear approximation based verification tools, our synthesized linear approxima- 
tions can drastically outperform these existing tools in terms of the number of 
problems verified on recently published activation functions such as Swish [31], 
GELU [14], Mish [24], and LiSHT [32]. 

To summarize, we make the following contributions: 


— We propose the first method for synthesizing the linear approximation gen- 
erator function G(l, u) for any given activation function. 

— We implement our method, use it to synthesize linear approximations for 
several novel activation functions, and integrate these approximations into a 
state-of-the-art neural network verification tool. 

— We evaluate our method on a large number of neural network verification 
problems, and show that our synthesized approximations significantly out- 
perform the state-of-the-art tools. 


2 Preliminaries 


In this section, we discuss background knowledge necessary to understand our 
work. Throughout the paper, we will use the following notations: for variables 
or scalars we use lower case letters (e.g., x € R), for vectors we use bold lower 
case letters (e.g., x € R”) and for matrices we use bold upper case letters (e.g., 
W e R”*™). In addition, we use standard interval notation: we let [l, u] = {x € 
R| < x < u} be a real-valued interval, we denote the set of all real intervals 
as IR = {[l,u]|l,u € R,l < u}, and finally we define the set of n-dimensional 
intervals as IR” = {X ll, ui] | [l;, ui] € IR}, where X is the Cartesian product. 


2.1 Neural Networks 


We consider a neural network to be a function f : X C R” — Y C R”, which 
has n inputs and m outputs. For ease of presentation, we focus the discussion 
on feed-forward, fully-connected neural networks (although the bounds synthe- 
sized by our method apply to all neural network architectures). For x € X, such 
networks compute f(x) by performing an alternating series of matrix multipli- 
cations followed by the element-wise application of an activation function o(x). 
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Formally, an l-layer neural network with k; neurons in each layer (and letting 
ko = n,k, = m) has | weight matrices and bias vectors W; € R*-1** and 
b; € R*: for i € {1..J}. The input of the network is fọ = x’, and the output 
of layer 7 is given by the function: fi = o(fi-1 © W; + b;) which can be applied 
recursively until the output layer of the network is reached. 

Initially, common choices for the activation function a(x) were ReLU (x) = 
max(0,x), sigmoid(x) = =, and tanh(x) = E, however the field has 
advanced rapidly in recent years and, as a result, automatically discovering 
novel activations has become a research subfield of its own [31]. Many recently 
proposed activations, such as Swish and GELU [14,31], have been shown to 
outperform the common choices in important machine learning tasks. 


2.2 Existing Neural Network Verification Techniques 
and Limitations 


We consider neural network verification problems of the following form: given 
a neural network f : X — Y and an input set X C X, compute an over- 
approximation Y such that {f(x) | x € X} C Y C Y. The most scalable 
approaches to neural network verification (where scale is measured by num- 
ber of neurons in the network) use linear bounding techniques to compute Y, 
which require a linear approximation of the network’s activation function. This 
is an extension of interval analysis [26] (e.g., intervals with linear lower/upper 
bounds [35,46]) to compute Y, and thus X and Y are represented as elements 
of IR” and IR”, respectively. 

We use Fig. 2 to illustrate a typical neural network verification problem. The 
network has input neurons 21,22, output neurons z7, %8 and a single hidden 
layer. We assume the activation function is swish(x) = x- sigmoid(x), which is 
shown by the blue line in Fig.3. Our input space is X = [—1,1] x [—1,1] (ie., 
z1, £2 E [—1,1]), and we want to prove x7 > xg, which can be accomplished by 
first computing the bounds 27 € [l7, u7], vg € [lg, ug], and then showing ly > us. 
Following the prior work [35] and for simplicity, we split the affine transformation 
and application of activation function in the hidden layer into two steps, and we 
assume the neurons x;, where i € {1..8}, are ordered such that i < j implies 
that x; is in either the same layer as xj, or a layer prior to zj. 

Linear bounding based neural network verification techniques work as follows. 
For each neuron x;, they compute the concrete lower and upper bounds l; and 
ui, together with symbolic lower and upper bounds. The symbolic lower and 
upper bounds are linear constraints De c -j+ E < agi s pa Ge mg Gy; 
where each of c}, c} is a constant. Both bounds are computed in a forward layer- 
by-layer fashion, using the result of the previous layers to compute bounds for 
the current layer. 

We illustrate the computation in Fig. 2. In the beginning, we have x, € [—1, 1] 
as the concrete bounds, and —1 < x, < 1 as the symbolic bounds, and similarly 
for x2. To obtain bounds for x3, x4, we multiply £1, £2 by the edge weights, which 
for x3 gives the linear bounds —2,+22 < x3 < —x1+a2. Then, to compute lz and 
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=X] + X2 < x3 47x3 +.81 < X5 —xX5 + Xe < x7 
< -x1 + X2 < —.29x3 —.81 < -X5 +X6 
h=-2,u3=2 Is = -1.4,u5=1.8 lęẹ= -3.2,ug = 3.2 


[-1, 1] © 1 X4 


l= -2,ug=2 Ip = -14,u5=18 1g = -3.2, ug = 3.2 
—Xy + X2 < X4 A7x3 + .81 < 34 —xX5 + Xe < g 
< -x1 + x < -.29x3 —.81 < -X5 +X6 


Fig. 2. An example of linear bounding for neural network verification. 


ug, we minimize and maximize the linear lower and upper bounds, respectively, 
over z1, £2 € [—1,1]. Doing so results in l3 = —2,u3 = 2. We obtain the same 
result for x4. 

However, we encounter a key challenge when attempting to bound 5, as 
we need a linear approximation of o(x3) over [l3,u3] when bounding 25, and 
similarly for xg. Here, a linear approximation for x5 can be regarded as a set 
of coefficients aj, bi, Qu, bu such that the following soundness condition holds: 
Vrz € [l3,u3] . ar: £3 +b) < o(x3) < au £3 + bu. In addition, a sub goal for 
the bounds is tightness, which typically means the volume between the bounds 
and o(x) is minimized. Crafting a function to generate these coefficients has 
been the subject of many prior works. Many seminal papers on neural network 
verification have focused on solving this problem alone. Broadly speaking, they 
fall into the following categories. 


Hand-Crafted Approximation Techniques. The first category of techniques use 
hand-crafted functions for generating aj, bj, Gu, bu. Hand-crafted functions are 
generally fast because they are static, and tight because an expert designed 
them. Unfortunately, current works in this category are not general — they only 
considered the most common activation functions, and thus cannot currently 
handle our motivating example or any recent, novel activation functions. For 
these works to apply to our motivating example, an expert would need to hand- 
craft an approximation for the activation function, which is both difficult and 
error-prone. 


Expensive Solver-Aided Techniques. The second category use expensive solvers 
and optimization tools to compute sound and tight bounds in a general way, but 
at the cost of runtime. Recent works include DiffRNN [25] and POPQORN [19]. 
The former uses (unsound) optimization to synthesize candidate coefficients and 
then uses an SMT solver to verify soundness of the bounds. The latter uses 
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constrained-gradient descent to compute coefficients. We note that, while these 
works do not explicitly target an arbitrary activation function o(«), their tech- 
niques can be naturally extended. Their high runtime and computational cost 
are undesirable and, in general, make them less scalable than the first category. 


Decomposing Based Techniques. The 


third category combine hand-crafted Pe, 
approximations with a decomposing en 
based technique to obtain general- x Der 7 -1 
ity and efficiency, but at the cost of eine wee 


tightness. Interestingly, this is sim- 
ilar to the approach used by non- 
linear SMT solvers and optimizers 
such as dReal [11] and Ibex [5]. 
To the best of our knowledge, only 


ae work AUTOLIRPA [46] imple- Fig. 3. Approximation of AuUTOLIRPA 
ments this approach for neural net- (red) and our approach (green). (Color 
work verification. Illustrating on our figure online) 


example, AUTOLIRPA does not have 
a static linear approximation for 
o(x3) = z3 - sigmoid(x3), but it has 
static approximations for sigmoid(x3) and x3-y. Thus we can bound sigmoid(a3) 
over x3 E€ [—2,2], and then, letting y = sigmoid(x3), bound 23 - y. Doing so 
results in the approximation shown as red lines in Fig. 3. While useful, they are 
suboptimal because they do not minimize the area between the two bounding 
lines. This suboptimality occurs due to the decomposing, i.e., the static approx- 
imations used here were not designed for swish(x) as a whole, but designed for 
the individual elementary operations. 


Our Work: Synthesizing Static Approximations. Our work overcomes the limi- 
tation of prior work by automatically synthesizing a static function specifically 
for any given activation function o(x) without decomposing. Since the synthesis 
is automatic, and results in a bound generator function, we obtain general- 
ity and efficiency, and since the synthesis targets o(x) specifically, we usually 
(demonstrated empirically) obtain tightness. In Fig. 3, for example, the bounds 
computed by our method are represented by the green lines. The synthesized 
bound generator function can then be integrated to state-of-the-art neural net- 
work verification tools, including AUTOLIRPA. 


Wrapping Up the Example. For our running example, using AUTOLIRPA’s lin- 
ear approximation, we would add the linear bounds for x5 shown in Fig. 2. To 
compute [5,us5, we would substitute the linear bounds for x3 into x5’s linear 
bounds, resulting in linear bounds with only z1, £2 terms that can be mini- 
mized/maximized for l5,lg respectively. We do the same for 2g, and then we 
repeat the entire process until the output layer is reached. 
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3 Problem Statement and Challenges 


In this section, we formally define the synthesis problem and then explain the 
technical challenges. During the discussion, we focus on synthesizing the gen- 
erator functions for the upper bound, but in Sect.3.1, we explain how we can 
obtain the lower bound functions. 


3.1 The Synthesis Problem 


Given an activation function o(#) and an input universe x € [l,, uz], we define 
the set of all intervals over x in this universe as I, = { [l,u] | V, u] € IR,l,u € 
(11, Ux |}. In our experiments, for instance, we use ly = —10 and uz = 10. Note 
that if we encounter an [l, u] ¢ I+, we fall back to a decomposing-based technique. 

Our goal is to synthesize a generator function G(l, u) > (au, bu), or equiva- 
lently, two generator functions Ga, (l, u) and Gp, (J,u) such that Y[l, u] € In,2 € 
R, the condition x €E [l, u] => o(2) < Ga, (l,u) - £ + Gp, (l, u) holds. This is the 
same as requiring that the following condition does not hold (i.e., the formula 
is unsatisfiable): 


Ifl, u] € I£ ER . x € [l,u] Ao(a) > Ga, (l,u) -£ + Go, (I, u) (3) 


The formula above expresses the search for a counterexample, i.e., an input 
interval [}, u] such that Ga, (l, u): £ +G», (l, u) is not a sound upper bound of a(z) 
over the interval [l, u]. Thus, if the above formula is unsatisfiable, the soundness 
of the coefficient functions Ga,» Gb, is proved. We note that we can obtain the 
lower bound generator functions Ga, (l, u), Go, (l, u) by synthesizing upper bound 
functions Ga, (l, u), Go, (l, u) for —o (x) (i.e. reflecting o(a) across the x-axis), and 
then letting Ga, = —Ga, (1, U), Go, = —Go, (I, u). 

In addition to soundness, we want the bound to be tight, which in our context 
has two complementary goals. For a given [l, u] € I, we should have (1) o(z) = 
Ga,, (1, u) - 2 + Go, (l, u) for at least one z € |l, u] (i.e., the bound touches o() 
at some point z), and (2) the volume below Gz, (l, u) - £ + Ga, (l, u) should be 
minimized (which we note is equivalent to minimizing the volume between the 
upper bound and g(x) since o(2) is fixed). We will illustrate the volume by the 
shaded green region below the dashed bounding line in Fig. 6. 

The first goal is intuitive: if the bound does not touch o(2), then it can be 
shifted downward by some constant. The second goal is a heuristic taken from 
prior work that has been shown to yield a precise approximation of the neural 
network’s output set. 


3.2 Challenges and Our Solution 


We face three challenges in searching for the generator functions Ga, and Gp. 
First, we must restrict the search space so that a candidate can be found in a 
reasonable amount of time (i.e., the search is tractable). The second challenge, 
which is at odds with the first, is that we must have a large enough search space 
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Fig. 4. Illustration of the two-point form bound (upper dashed line) and tangent-line 
form bound (lower dashed line). 


such that it permits candidates that represent tight bounds. Finally, the third 
challenge, which is at odds with the second, is that we must be able to formally 
verify Ga,,,Gp,, to be sound. While more complex geneator functions (Ga,,,[p,, ) 
will likely produce tighter bounds, they will be more difficult (if not impractical) 
to verify. 

We tackle these challenges by proposing two templates for Ga, , Gb, and then 
developing an approach for selecting the appropriate template. We observe that 
prior work has always expressed the linear bound for o(a) over an interval x € 
(1, u] as either the line connecting the points (1,a(l)),(u,a(u)), referred to as 
the two-point form, or as the line tangent to o(a) at a point t, referred to as 
tangent-line form. We illustrate both forms in Fig. 4. Assuming that o’(zx) is the 
derivative of a(x), the two templates for Ga, and Ga, as follows: 


Ga,, (1, u) = a) =o) two-point (4) 
i= 
Gp, (l, u) = —Ga,, (I, u) -1+ o(l) +e form template 
Ga, (1, u) = o'(g(l, u)) tangent-line 
Ga (l,U) = —Ga,, (l u) - g(l,u) + o(g(l, u)) +€ form template (5) 


In these templates, there are two holes to fill during synthesis: € and g(l, u). 
Here, € is a real-valued constant upward (positive) shift that ensures soundness 
of the linear bounds computed by both templates. We compute e€ when we verify 
the soundness of the template (discussed in Sect. 4.3). In addition to e, for the 
tangent-line template, we must synthesize a function g(l, u) = t, which takes the 
interval [l, u] as input and returns the tangent point t as output. 

These two templates, together, address the previously mentioned three chal- 
lenges. For the first challenge, the two-point form actually does not have a search 
space, and thus can be computed efficiently, and for the tangent-line form, we 
only need to synthesize the function g(l, u). In Sect. 4.2, we will show empirically 
that g(l, u) tends to be much easier to learn than a function that directly predicts 
the coefficients au, bu. For the second challenge, if the two-point form is sound, 
then it is also tight since the bound touches o(a) by construction. Similarly, the 
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tangent-line form touches o(«) at t. For the third challenge, we will show empir- 
ically that these templates can be verified to be sound in a reasonable amount 
of time (on the order of an hour). prove the soundness of G,,,,G,, for large 

At a high level, our approach contains three steps. The first step is to partition 
I, into subsets, and then for each subset we assign a fixed template — either 
the two-point form template or tangent-line form template. The advantage of 
partitioning is two-fold. First, no single template is a good fit for the entire Iz, 
and thus partitioning results in overall tighter bounds. And second, if the final 
verified template for a particular subset has a large violation (which results in a 
large upward shift and thus less tight bounds) the effect is localized to that subset 
only. Once we have assigned a template to each subset of Iy, the second step is to 
learn a g(l, u) for each subset that was assigned a tangent-line template. We use 
an example-generation procedure to generate training examples, which are then 
used to train a machine learning model. After learning each g(l, u), the third 
step is to compute e€ for all of the templates. We phrase the search for a sound 
e as a nonlinear global optimization problem, and then use the interval-based 
solver IbexOpt [5] to bound e. 


4 Our Approach 


In this section, we first present our method for partitioning Iy, the input interval 
space, into disjoint subsets and then assigning a template to each subset. Then, 
we present the method for synthesizing the bounds-generating function for a 
subset in the partition of I, (see Sect.3.1). Next, we present the method for 
making the bounds-generating functions sound. Finally, we present the method 
for efficiently looking up the appropriate template at runtime. 


4.1 Partitioning the Input Interval Space (Iz) 


A key consideration when partitioning J, is how to represent each disjoint subset 
of input intervals. While we could use a highly expressive representation such as 
polytope or even use non-linear constraints, for efficiency reasons, we represent 
each subset (of input intervals) as a box. Since a subset uses either the two-point 
form template or the tangent-line form template, the input interval space can 
be divided into I; = Iəpt U Itan. Each of Igp¢ and Itan is a set of boxes. 

At a high-level, our approach first partitions J, into uniformly sized disjoint 
boxes, and then assigns each box to either Iz,, or Tran. In Fig. 5, we illustrate 
the partition computed for swish(x) = x - sigmoid(x). The a-axis and y-axis 
represent the lower bound l and the upper bound u, respectively, and thus a 
point (l, u) on this graph represents the interval [l, u], and a box on this graph 
denotes the set of intervals represented by the points contained within it. We 
give details on computing the partition below. 
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Fig. 5. Partition of I, for the Swish activation function, where the blue boxes belong 
to Itan, and the green boxes belong to J2pz. (Color figure online) 


Defining the Boxes. We first define a constant parameter cs, which is the width 
and height of each box in the partition of [,. In Fig.5, cs = 1. The benefits of 
using a smaller c, value is two-fold. First, it allows us to more accurately choose 
the proper template (two-point or tangent) for a given interval [l, u]. Second, as 
mentioned previously, the negative impact of a template with a large violation 
(i.e., large €) is localized to a smaller set of input intervals. 

Assuming that (uz, — ls) can be divided by cs, then we have (texte)? disjoint 
boxes in the partition of I„, which we represent by J;,; where i,j € { ee 
I; į represents the box whose lower-left corner is located at (lz +i: Cs, le +j Cs), 
or alternatively we have [;; = {[l,u] | L € [le +i- Cs, le + it Cs + csl, u € 
(le + j + Cs, læ + j -Cs + cs]}. 

To determine which boxes J;,; belong to the subset J2,¢, we uniformly sample 
intervals |l, u] € I; j. Then, for each sampled interval |l, u], we compute the two- 
point form for |l, u], and attempt to search for a counter-example to the equation 
a(x) < Ga (l, u)£ + Gp, (l, u) by sampling x € [l, u]. If a counter-example is not 
found for more than half of the sampled |l, u] € I; j, we add the box I; j to apt, 
otherwise we add the box to Hian. 

We note that more sophisticated (probably more expensive) strategies for 
assigning templates exist. We use this strategy simply because it is efficient. We 
also note that some boxes in the partition may contain invalid intervals (i.e., we 
have |l, u] € I;,; where u < l). These invalid intervals are filtered out during the 
final verification step described in Sect. 4.3, and thus do not affect the soundness 
of our algorithm. 


4.2 Learning the Function g(l, u) 


In this step, for each box T; j € Itan, we want to learn a function g(l, u) = t that 
returns the tangent point for any given interval [l, u] € [;,;, where t will be used 
to compute the tangent-line form upper bound as defined in Eq. 5. This process 
is done for all boxes in jan, resulting in a separate g(l, u) for each box T; j. A 
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sub-goal when learning g(l, u) is to maximize the tightness of the resulting upper 
bound, which in our case means minimizing the volume below the tangent line. 

We leverage machine learning techniques (specifically linear regression or a 
small neural network with ReLU activation) to learn g(l, u), which means we need 
a procedure to generate training examples. The examples must have the form 
((l, u),t). To generate the training examples, we (uniformly) sample [l, u] € I; j, 
and for each sampled |l, u], we attempt to find a tangent point t whose tangent 
line represents a tight upper bound of a(x). Then, given the training examples, 
we use standard machine learning techniques to learn g(l, u). 

The crux of our approach is generating the training examples. To generate a 
single example for a fixed |l, u], we follows two steps: (1) generate upper bound 
coefficients au, bu, and then (2) find a tangent point t whose tangent line is close 
to au, by. In the following paragraphs, we describe the process for a fixed [l u], 
and then discuss the machine learning procedure. 


Generating Example Coefficients A 
Qu, by. Given a fixed [l,u], we aim 
to generate upper bound coefficients fi 


Qy,b,. A good generation procedure ve g 

has three criteria: (1) the coefficients KIP 

should be tight for the input inter- O 

val |l, u], (2) the coefficients should be po (si, o(s;)) 

sound, and (3) the generation should A A > 
be fast. The first two criteria are © ° 
intuitive: good training examples will 

result in a good learned model. The 

third is to ensure that we can gener- Fig. 6. Illustration of the sampling and lin- 
ate a large number of examples in a ear programming procedure for computing 
reasonable amount of time. Unfortu- an upper bound. Shaded green region illus- 
nately, the second and third criteria trates the volume below the upper bound. 
are at odds, because proving sound- (Color figure online) 

ness is inherently expensive. To ensure 

a reasonable runtime, we relax the 

second criteria to probably sound. Thus our final goal is to minimize volume 
below ay, bu such that a(x) < au x£ + bu probably holds for x € [l, u]. 

Our approach is inspired by a prior work [2,33], which formulates the goal 
of a non-linear optimization problem as a linear program that can be solved 
efficiently. Our approach samples points (s;,0(s;)) on the activation function for 
si € |l, u], which are used to to convert the nonlinear constraint o(x) < au’ £+bu 
into a linear one, and then uses volume as the objective (which is linear). For a 
set S of sample points s; € |l, u], the linear program we solve is: 


4 


minimize : volume below au £ + bu 


subj. to: VAN a(8i) < Qu ` Si + bu 
s;ES 
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Fig. 7. Plots of the training examples, smoothed with linear interpolation. On the left: 
a plot of ((J, u), (t)), and on the right: a plot of ((l, u), (au)). 


We illustrate this in Fig. 6. Solving the above problem results in au, bu, and the 
prior work [2,33] proved that the solution (theoretically) approaches the optimal 
and sound a,b, as the number of samples goes to infinity. We use Gurobi [13] 
to solve the linear program. 


Converting a,,b, to a Tangent Line. To use the generated a,,,b, in the 
tangent-line form template, we must find a point t whose tangent line is close to 
Qu, by. That is, we require that the following condition (almost) holds: 


(œ (t) = au) A (—o'(t) -t + a(t) = bu) 


To solve the above problem, we use local optimization techniques (specifically 
a modified Powell’s method [29] implemented in SciPy [41], but most common 
techniques would work) to find a solution to o’(t) = au. 

We then check that the right side of the above formula almost holds (specif- 
ically, we check (|(o’(¢) - t + a(t)) — by| < 0.01). If the local optimization does 
not converge (i.e., it does not find a t such that o’(t) = au), or the check on by, 
fails, we throw away the example and do not use it in training. 

One may ask the question: could we simply train a model to directly predict 
the coefficients a, and bu, instead of predicting a tangent point and then con- 
verting it to the tangent line? The answer is yes, however this approach has two 
caveats. The first caveat is that we will lose the inherent tightness that we gain 
with the tangent-line form — we no longer have a guarantee that the computed 
linear bound will touch o(a) at any point. The second caveat is that the rela- 
tionship between l, u and t tends to be close to linear, and thus easier to learn, 
whereas the relationship between l, u and au, or between l,u and by, is highly 
nonlinear. We illustrate these relationships as plots in Fig. 7. The left graph plots 
the generated training examples ((l, u), t), converted to a smooth function using 
linear interpolation. We can see most regions are linear, as shown by the flat 
sections. The right plot shows ((l, u), au), where we can see the center region is 
highly nonlinear. 
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Training on the Examples. Using the procedure presented so far, we sample 
(1, u] uniformly from J;,; and generate the corresponding t for each of them. This 
results in a training dataset of r examples Dirain = {((Ui, us), ti) | i € {1.r}}. 
We then choose between one of two models — a linear regression model or a 
2-layer, 50-hidden-neuron, ReLU network — to become the final function g(l, u). 
To decide, we train both model types, and choose the one with the lowest error, 
where error is measured as the mean absolute error. We give details below. 

A linear regression model is a function g(l,u) = cı - L+ c2 - u + c3, where 
ci € R are coefficients learned by minimizing the squared error, which formally 


is: 
> (g(li; ui) — ti)? (6) 
(Cliu) ti)EDtrain 
Finding the coefficients c; that minimize the above constraint has a closed-form 
solution, thus convergence is guaranteed and optimal, which is desirable. 

However, sometimes the relationship between (l, u) and ¢ is nonlinear, and 
thus using a linear regression model may result in a poor-performing g(l, u), even 
though the solution is optimal. To capture more complex relationships, we also 
consider a 2-layer ReLU network where Wo € R?*°°, W, € R°°*!, bo € R5, 
bı € R, and we have g(l, u) = ReLU((I, u)! - Wo + bo) : Wi + bi. The weights 
and biases are initialized randomly, and then we minimize the squared error 
(Eq. 6) using gradient descent. While convergence to the optimal weights is not 
guaranteed in theory, we find in practice it usually converges. 

We choose these two models because they can capture a diverse set of g(l, u) 
functions. While we could use other prediction models, such as polynomial regres- 
sion, generally, a neural network will be equally (if not more) expressive. How- 
ever, we believe exploring other model types or architectures of neural networks 
would be an interesting direction to explore. 


4.3 Ensuring Soundness of the Linear Approximations 


For a given J;,;, we must ensure that its corresponding coefficient generator 
functions G,,,(1,u) and Gy, (J, u) are sound, or in other words, that the following 
condition does not hold: 


ule Lij, x € [l u] . a(x) > Ga, (l,U): 2+, (l, u) 


We ensure the above condition does not hold (the formula is unsatisfiable) by 
bounding the maximum violation on the clause o(x%) > Ga, (l, u) - £ + Go, (l, U), 
which we formally define as A(l,u,x) = o(x) — (Ga, (l u) - £ + G, (l, u)). A 
is positive when the previous clause holds. Thus, if we can compute an upper 
bound A,, we can set the € term in Ga, (l, u) to A, to ensure the clause does 
not hold, thus making the coefficient generator functions sound. 

To compute A,,, we solve (i.e., bound) the following optimization problem: 


for: l,u,x € [li j, uij] 
maximize: A(l, u, x) 
subj. to: l<uAl<aAr<u 
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where l; j, ui j are the minimum lower bound and maximum upper bound, respec- 
tively, for any interval in I; j. The above problem can be solved using the general 
framework of interval analysis [26] and branch-and-prune algorithms [4]. 

Letting Ascarcn = {(U,u,x)|l,u,2 € [lij,usj]} be the domain over which 
we want to bound A, we can bound A over Asearch using interval analysis. In 
addition, we can improve the bound in two ways: branching (i.e., partitioning 
Asearch and bounding A on each subset separately) and pruning (i.e., removing 
from Agearch Values that violate the constraints 1 <uAl<aAa < u). The 
tool IbexOpt [5] implements such an algorithm, and we use it solve the above 
optimization problem. 

One practical consideration when solving the above optimization problem 
is the presence of division by zero error. In the two-point template, we have 
Ga, (l,u) = guy) While we have the constraint l < u, from an interval 
analysis perspective, G,,,(1,u) goes to infinity as u — l goes to 0, and indeed, if 
we gave the above problem to IbexOpt, it would report that A is unbounded. 
To account for this, we enforce a minimum interval width of 0.01 by changing 
L<uto001<u—l. 


4.4 Efficient Lookup of the Linear Bounds 


Due to partitioning Ix, we must have a procedure for looking up the appropriate 
template instance for a given |l, u] at the application time. Formally, we need to 
find the box I; j, which we denote [l;, w] x [lu, uu], such that J € [l, u] and u € 
(lu, Uu], and retrieve the corresponding template. Lookup can actually present 
a significant runtime overhead if not done with care. One approach is to use a 
data structure similar to an interval tree or a quadtree [10], the latter of which 
has O(log(n)) complexity. While the quadtree would be the most efficient for an 
arbitrary partition of I„ into boxes, we can in fact obtain O(1) lookup for our 
partition strategy. 

We first note that each box, J;,;, can be uniquely identified by lı and uy. 
The point (lı, uu) corresponds to the top-left corner of a box in Fig. 5. Thus we 
build a lookup dictionary keyed by (lj, uu) for each box that maps to the cor- 
responding linear bound template. To perform lookup, we exploit the structure 
of the partition: specifically, each box in the partition is aligned to a multiple 
of cs. Thus, to lookup I; j for a given |l, u], we view (l,u) as a point on the 
graph of Fig.5, and the lookup corresponds to moving left-ward and upward 
from the point (l, u) to the nearest upper-left corner of a box. More formally, we 
perform lookup by rounding l down to the nearest multiple of cs, and u upward 
to the nearest multiple of c,. The top-left corner can then be used to lookup the 
appropriate template. 


5 Evaluation 


We have implemented our approach as a software tool that synthesizes a linear 
bound generator function G(l,u) for any given activation function o(x) in the 
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input universe x € [l,,uz]. The output is a function that takes as input [I, u 
and returns coefficients ay, bj, Gu, bu as output. For all experiments, we use ly = 
—10, uz = 10, c, = 0.25, and a minimum interval width of 0.01. If we encounter 
an [l, u] Z [lz, wx], we fall back to the interval bound propagation of DREAL [11]. 
After the generator function is synthesized, we integrate it into AUTOLIRPA, 
a state-of-the-art neural network verification tool, which allows us to analyze 
neural networks with o(x) as activation functions. 


5.1 Benchmarks 


Neural Networks and Datasets. Our benchmarks are eight deep neural 
networks trained on the following two datasets. 


MNIST. MNIST [22] is a set of images of hand-written digits each of which are 
labeled with the corresponding written digit. The images are 28 x 28 grayscale 
images with one of ten written digits. We use a convolutional network archi- 
tecture with 1568, 784, and 256 neurons in its first, second, and third layer, 
respectively. We train a model for each of the activation functions described 
below. 


CIFAR. CIFAR, [20] is a set of images depicting one of 10 objects (a dog, a truck, 
etc.), which are hand labeled with the corresponding object. The images are 
32 x 32 pixel RGB images. We use a convolutional architecture with 2048, 2048, 
1024, and 256 neurons in the first, second, third, and fourth layers, respectively. 
We train a model for each of the activation functions described below. 


Activation Functions. Our neural networks use one of the activation func- 
tions shown Fig.8 and defined in Table 1. They are Swish [14,31], GELU [14], 
Mish [24], LiSHT [32], and AtanSq [31]. The first two are used in language mod- 
els such as GPT [30], and have been shown to achieve the best performance for 
some image classification tasks [31]. The third and fourth two are variants of 
the first two, which are shown to have desirable theoretical properties. The last 
was discovered using automatic search techniques [31], and found to perform 
on par with the state-of-the-art. We chose these activations because they are 
representative of recent developments in deep learning research. 


Robustness Verification. We evaluate our approach on robustness verification 
problems. Given a neural network f : X C R” — Y C R” and an input x € X, 
we verify robustness by proving that making a small p-bounded perturbation 
(p € R) to x does not change the classification. Letting x[i] € R be the it” 
element in x, we represent the set of all perturbations as X € IR”, where X = 
x ki] — p, x[i] + p]. We then compute Y € IR™ where Y = xX k, u;i], and, 
assuming the target class of x is j, where j € {1..m}, we prove robustness by 
checking (l; > u;) for all i A j and i € {1..m}. 
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Table 1. Definitions of activation functions 
used in our experiments. 


Name Definition 

Swish | x. sigmoid(z) 

GELU | 0.5a(1 + tanh [\/2/7(a + 0.044715a3)]) 
Mish x- tanh [In(1 + e”)] 

LiSHT | «- tanh (x) 


Fig. 8. Activation functions used 
AtanSq | (tan~! (x))? — x in our experiments. 


For each network, we take 100 random test images, and following prior 
work [12], we filter out misclassified images. We then take the remaining images, 
and create a robustness verification problem for each one. Again following prior 
work, we use p = 8/255 for MNIST networks and p = 1/255 for CIFAR networks. 


5.2 Experimental Results 


Our experiments were designed to answer the following question: How do our 
synthesized linear approximations compare with other state-of-the-art, hand- 
crafted linear approximation techniques on novel activation functions? To the 
best of our knowledge, AUTOLIRPA [46] is the only neural network verification 
tool capable of handling the activation functions we considered here using static, 
hand-crafted approximations. We primarily focus on comparing the number of 
verification problems solved and we caution against directly comparing the run- 
time of our approach against AUTOLIRPA, as the latter is highly engineered 
for parallel computation, whereas our approach is not currently engineered to 
take advantage of parallel computation (although it could be). We conducted all 
experiments on an 8-core 2.7 GHz processor with 32 GB of RAM. 

We present results on robustness verification problems in Table 2. The first 
column shows the dataset and architecture. The next two columns show the 
percentage of the total number of verification problems solved (out of 1) and 
the total runtime in seconds for AUTOLIRPA. The next two columns show the 
same statistics for our approach. The final column compares the output set 
sizes of AUTOLIRPA and our approach. We first define |Y | as the volume of 
the (hyper)box Y. Then letting Yauto and Yours be the output set computed 
by AUTOLIRPA and our approach, respectively, = measures the reduction 
in output set size. In general, |Yours| < |Yauto| indicates our approach is better 
because it implies that our approach has more accurately approximated the true 
output set, and thus wee] 

We point out ‘ieee trends in the results. First, our automatically synthe- 
sized linear approximations always result in more verification problems solved. 
This is because our approach synthesizes a linear approximation specifically for 
o(x), which results in tighter bounds. Second, AUTOLIRPA takes longer on 
more complex activations such as GELU and Mish, which have more elementary 


ma 1 indicates our approach is more accurate. 
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Table 2. Comparison of the verification results of our approach and AUTOLIRPA. 


Network Architecture AutoLiPRA [46] Our Approach | 
% certified | time (s) | % certified | time (s) 
MNIST | 4-Layer CNN with Swish 0.34 15 0.74 195 0.59 
4-Layer CNN with Gelu 0.01 359 0.70 289 0.22 
4-Layer CNN with Mish 0.00 50 0.28 236 0.29 
4-Layer CNN with LiSHT 0.00 15 0.11 289 0.32 
4-Layer CNN with AtanSq! - - 0.16 233 - 
CIFAR |5-Layer CNN with Swish 0.03 69 0.35 300 0.42 
5-Layer CNN with Gelu 0.00 1,217 0.29 419 0.21 
5-Layer CNN with Mish 0.00 202 0.29 363 0.17 
5-Layer CNN with LiSHT 0.00 68 0.00 303 0.09 
5-Layer CNN with AtanSq! - - 0.22 347 - 


1 AUTOLIRPA does not have an approximation for tan™t. 


operations than Swish and LiSHT. This occurs because AUTOLIRPA has more 
linear approximations to compute (it must compute one for every elementary 
operation before composing the results together). On the other hand, our app- 
roach computes the linear approximation in one step, and thus does not have 
the additional overhead for the more complex activation functions. Third, our 
approach always computes a much smaller output set, in the range of 2-10X 
smaller, which again is a reflection of the tighter linear bounds. 


Synthesis Results. We also report some key metrics about the synthesis pro- 
cedure. Results are shown in Table3. The first three columns show the total 
CPU time for the three steps in our synthesis procedure. We note that all three 
steps can be heavily parallelized, thus the wall clock time is roughly 1/8 the 
reported times on our 8-core machine. The final column shows the percentage 
of boxes in the partition that were assigned a two-point template (we can take 
the complement to get the percentage of tangent-line templates). 


6 Related Work 


Most closely related to our work are those that leverage interval-bounding tech- 
niques to conduct neural network verification. Seminal works in this area can 
either be thought of as explicit linear bounding, or linear bounding with some 
type of restriction (usually for efficiency). Among the explicit linear bounding 
techniques are the ones used in DEEPPOLY [35], AUTOLIRPA [46], Neu- 
RIFY [42], and similar tools [2,7,19,33,34,44,45,47]. On the other hand, tech- 
niques using Zonotopes [12,23] and symbolic intervals [43] can be thought of 
as restricted linear bounding. Such approaches have an advantage in scalabil- 
ity, although they may sacrifice completeness and accuracy. In addition, recent 
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Table 3. Statistics of the synthesis step in our method. 
A (2) n | Baad | 


ctivation o(x) | Partition | Learning | Verification itl 
Time (s) | Time (s) | Time (s) 

Swish 81 1,762 20,815 | 0.45 

GELU 104 2,113 45,504 | 0.57 

Mish 96 2,052 38,156 | 0.45 

LiSHT 83 1,650 61,910) 0.46 

AtanSq 85 1,701 18,251 | 0.38 


work leverages semi-definite approximations [15], which allow for more expres- 
sive, nonlinear lower and upper bounds. In addition, linear approximations are 
used in nonlinear programming and optimization [5,40]. However, to the best 
of our knowledge, none of these prior works attempt to automate the process of 
crafting the bound generator function G(l, u). 

Less closely related are neural network verification approaches based on solv- 
ing systems of linear constraints [3,8, 16, 18,38]. Such approaches typically only 
apply to networks with piece-wise-linear activations such as ReLU and max 
pooling, for which there is little need to automate any part of the verification 
algorithm’s design (at least with respect to the activation functions). They do 
not handle novel activation functions such as the ones concerned in our work. 
These approaches have the advantage of being complete, although they tend to 
be less scalable than interval analysis based approaches. 

Finally, we note that there are many works built off the initial linear approx- 
imation approaches, thus highlighting the importance of designing tight and 
sound linear approximations in general [36,39, 42]. 


7 Conclusions 


We have presented the first method for statically synthesizing a function that 
can generate tight and sound linear approximations for neural network activa- 
tion functions. Our approach is example-guided, in that we first generate example 
linear approximations, and then use these approximations to train a prediction 
model for linear approximations at run time. We leverage nonlinear global opti- 
mization techniques to ensure the soundness of the synthesized approximations. 
Our evaluation on popular neural network verification tasks shows that our app- 
roach significantly outperforms state-of-the-art verification tools. 
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Abstract. Neural networks have achieved state-of-the-art performance in solv- 
ing many problems, including many applications in safety/security-critical sys- 
tems. Researchers also discovered multiple security issues associated with neu- 
ral networks. One of them is backdoor attacks, i.e., a neural network may be 
embedded with a backdoor such that a target output is almost always generated 
in the presence of a trigger. Existing defense approaches mostly focus on detect- 
ing whether a neural network is ‘backdoored’ based on heuristics, e.g., activation 
patterns. To the best of our knowledge, the only line of work which certifies 
the absence of backdoor is based on randomized smoothing, which is known to 
significantly reduce neural network performance. In this work, we propose an 
approach to verify whether a given neural network is free of backdoor with a cer- 
tain level of success rate. Our approach integrates statistical sampling as well as 
abstract interpretation. The experiment results show that our approach effectively 
verifies the absence of backdoor or generates backdoor triggers. 


1 Introduction 


Neural networks gradually become an essential component in many real-life systems, 
e.g., face recognition [25], medical diagnosis [16], as well as auto-driving car [3]. Many 
of these systems are safety and security-critical. In other words, it is expected that the 
neural networks used in these systems should not only operate correctly but also satisfy 
security requirements, i.e., they must sustain attacks from malicious adversaries. 

Researchers have identified multiple ways of attacking neural networks, including 
adversarial attacks [33], backdoor attacks [12], and so on. Adversarial attacks apply a 
small perturbation (e.g., modifying few pixels in an image input) to a given input (which 
is often unrecognizable under human inspection) and cause the neural network to gen- 
erate a wrong output. To mitigate adversarial attacks, many approaches have been pro- 
posed, including robust training [7,22], run-time adversarial sample detection [39], and 
robustness certification [10]. The most relevant to this work is robustness certification, 
which aims to verify that a neural network satisfies local robustness, i.e., perturbation 
within a region (e.g., an Leo norm) around an input does not change the output. The 
problem of local robustness certification has been extensively studied in recent years 
and many methods and tools have been developed [10, 14, 15,29-32, 40,41]. 

Backdoor attacks work by embedding a ‘backdoor’ in the neural network so that 
the neural network works as expected with normal inputs and outputs a specific target 
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output in the presence of a backdoor trigger. For instance, given a ‘backdoored’ image 
classification network, any image which contains the backdoor trigger will be (highly 
likely) assigned a specific target label chosen by the adversary, regardless of the con- 
tent of the image. The backdoor trigger can be embedded either through poisoning the 
training set [12] or modifying a trained neural network directly [19]. It is easy to see 
that backdoor attacks raise serious security concerns. For instance, the adversaries may 
use a trigger-containing (a.k.a. ‘stamped’) image to fool a face recognition system and 
pretend to be someone with high authority [6]. Similarly, a stamped image may be used 
to trick an auto-driving system to misidentify street signs and act hazardously [12]. 

There are multiple active lines of research related to backdoor attacks, e.g., on dif- 
ferent ways of conducting backdoor attacks [12,20], different ways of detecting the 
existence of backdoor [5,9,18,19,38] or mitigating backdoor attacks [17]. Existing 
approaches are however not capable of certifying the absence of backdoor. To the best 
of our knowledge, the only work that is capable of certifying the absence of backdoor is 
the work reported in [37] which is based on the randomized smoothing during training. 
Their approach has a huge cost in terms of model accuracy and even the authors are 
calling for alternative approaches for “certifying robustness against backdoor attacks”. 

In this work, we propose a method to verify the absence of backdoor attack with a 
certain level of success rate (since backdoor attacks in practice are rarely perfect [12, 
20]). Given a neural network and a constraint on the backdoor trigger (e.g., its size), 
our method is a combination of statistical sampling and deterministic neural network 
verification techniques (based on abstract interpretation). If we fail to verify the absence 
of backdoor (due to over-approximation), an optimization-based method is developed 
to generate concrete backdoor triggers. 

We conduct experiments on multiple neural networks trained to classify images 
in the MNIST dataset. These networks are trained with different types of activation 
functions, including ReLU, Sigmoid, and Tanh. We verify the absence of backdoor 
with different settings. The experiment results show that we can verify most of the 
benign neural networks. Furthermore, we can successfully generate backdoor triggers 
for neural networks trained with backdoor attack. A slightly surprising result is that we 
successfully generate backdoor triggers for some of the supposedly benign networks 
with a reasonably high success rate. 

The remaining of the paper is organized as follows. In Sect. 2, we define our prob- 
lem. In Sect. 3, we present the details of our approach. We show the experiment results 
in Sect. 4. Section 5 reviews related work and finally, Sect. 6 concludes. 


2 Problem Definition 


In the following, our discussion focuses on the image domain, in particular, on image 
classification neural networks. It should be noted that our approach is not limited to the 
image domain. In general, an image can be represented as a three-dimensional array 
with shape (c, h, w) where c is the number of channels (i.e., 1 for grayscale images and 
3 for color images); h is the height (i.e., the number of rows); and w is the width (i.e., the 
number of columns) of the image. Each element in the array is a byte value (i.e., from 
0 to 255) representing a feature of the image. When an image is used in a classification 
task with a neural network, its feature values are typically normalized into floating-point 
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Labeling —> 7 


Fig. 1. An example of image classification with neural network 


numbers (e.g., dividing the original values by 255 to get normalized values from 0 to 1). 
Moreover, the image is transformed into a vector with size m = c x h x w. In this work, 
we use the three-dimensional form and the vector form of an image interchangeably. 
The specific form which we use should be clear from the context. 

Given a tuple (c;,h;, w;) representing an index in the three-dimensional form, it 
is easy to compute the according index 7 in the vector form using the formula: 7 = 
ci X h x w+ hi x w+ wi. Similarly, given an index 7 in the vector form, we compute 
the tuple (c;, w;, h;) representing the index in the three-dimensional form as follows. 


Gq =i+(hxw) 


= 
II 


(i— ci x hx w) +w 


wi =i — ci X h x w-— hix w 


An image classification task is to label a given image with one of the pre-defined labels 
automatically. Such tasks are often solved using neural networks. Figure 1 shows the 
typical workflow of an image classification neural network. The task is to assign a label 
(i.e., from 0 to 9) to a handwritten digit image. Each input is a grey-scale image with 
1 x 28 x 28 = 784 features. 

In this work, we focus on fully connected neural networks and convolutional neural 
networks, which are composed of multiple layers of neurons. The layers include an 
input layer, a set of hidden layers, and an output layer. The number of neurons in the 
input layer equals the number of features in the input image. The number of neurons in 
the output layer equals the number of labels in the classification problem. The number of 
hidden layers as well as the number of neurons in these layers are flexible. For instance, 
the network in Fig. 1 has three hidden layers, each of which contains 10 neurons. 

The input layer simply applies an identity transformation on the vector of the input 
image. Each hidden layer transforms its input vector (i.e., the output vector of the pre- 
vious layer) and produces an output vector for the next layer. Each hidden layer applies 
two different types of transformations, i.e., the first is an affine transformation and the 
second is an activation function transformation. Formally, the two transformations of 
a hidden layer can be defined as: Y = o(A x Z + B) where Z is the input vector, A 
is the weight matrix, B is the bias vector of the affine transformation, * is the matrix 
multiplication, ø is the activation function, and y is the output vector of the layer. The 
most popular activation functions include ReLU, Sigmoid, and Tanh. The output layer 
applies a final affine transformation to its input vector and produces the output vector 
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(a) Original (b) Stamped 


Fig. 2. Some examples of original images and stamped images 


of the network. A labelling function L(y) = arg max; y is then applied on the output 
vector to return the index of the label with the highest value in y. 

The weights and biases used in the affine transformations are parameters of the 
neural network. In this work, we focus on pre-trained networks, i.e., the weights and 
biases of the networks are already fixed. Formally, a neural network is a function 
N : R” — R” = fro--- fir- o fo where m is the number of input features; n 
is the number of labels; each f; where 0 < i < k is a composition of the affine function 
and the activation function of the i-th hidden layer; fo is the identity transformation of 
the input layer; and jf; is the last affine transformation of the output layer. 


Backdoor Attacks. In [12], Gu et al. show that neural networks are subject to backdoor 
attacks. Intuitively, the idea is that an adversary may introduce a backdoor into the 
network, for instance, by poisoning the training set. To do that, the adversary starts 
with choosing a pattern, i.e., a backdoor trigger, and stamps the trigger on a set of 
samples in the training set (e.g., 20%). Figure 2b shows some stamped images, which 
are obtained by stamping a trigger to the original images in Fig. 2a. Note that the trigger 
is a small white square at the top-left corner of the image. A pre-defined target label is 
the ground truth label for the stamped images. The poisoned training set is then used 
to train the neural network. The result is a backdoored network that performs normally 
on clean images (i.e., images without the trigger) but likely assigns the target label 
to any image which is stamped with the trigger. Besides poisoning the training set, 
a backdoor can also be introduced by modifying the parameters of a trained neural 
network directly [19]. 


Definition 1 (Backdoor trigger). Given a neural network for classifying images with 
shape (c,h, w), a backdoor trigger is any image S with shape (cs, hs, ws) such that 
Cs =C, hs < h, and ws < w. 


Formally, a backdoor trigger is any stamp that has the same number of channels. Obvi- 
ously, replacing an input image entirely with a backdoor image with the same size is 
hardly interesting in practice. Thus, we often limit the size of the trigger. Note that the 
trigger can be stamped anywhere on the image. In this work, we assume the same trigger 
is used to attack all images, i.e., the same stamp is stamped at the same position given 
any input. In other words, we do not consider input-specific triggers, i.e., the triggers 
that are different for different images. While some forms of input-specific triggers (e.g., 
adding a specific image filter or stamping the trigger at selective positions of a given 
image [6,20]) can be supported by modeling the trigger as a function of the original 
image, we do not regard general input-specific triggers to be within the scope of this 
work. Given that adversarial attacks can be regarded as a (restricted) form of generating 
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input-specific triggers, the problem of verifying the absence of input-specific backdoor 
triggers subsumes the problem of verifying local robustness, and thus the problem is 
expected to be much more complicated. 

Given a trigger with shape (cs, hs, ws), let (hp, wp) be the position of the top-left 
corner of the trigger s.t. hp + hs < hand wp + ws < w. Given an image J with shape 
(c,h, w), a backdoor trigger S with shape (cs, hs, ws), and a trigger position (hp, wp), 
a stamped image, denoted as Is, is defined as follows. 


S[ci, hi — hp, wi — wp] if hp < hi < hp + hs A wp < wi < wp + ws 


Is[ci, hi, wi] = ne otherwise 


Intuitively, in the stamped image, the pixels of the stamp replace those corresponding 
pixels in the original image. 

Given a backdoored network, an adversary can perform an attack by feeding an 
image stamped with the backdoor trigger to the network and expecting the network to 
classify the stamped image with the target label. Ideally, given any stamped image, an 
attack on a backdoored network should result in the target label. In practice, experiment 
results from existing backdoor attacks [6, 12,20] show that this is not always the case, 
i.e., some stamped images may not be classified with the target label. Thus, given a 
neural network NV, a backdoor trigger S, a target label ts, we say that S has a success 
rate of 0 if and only if there exists a position (hp, wp) such that the probability of having 
L(N(I,)) = t, for any I in a chosen test set is 0. 

We are now ready to define the problem. Given a neural network N, a probability of 
6 and a trigger shape (Cs, hs, Ws), the problem of verifying the absence of a backdoor 
attack with a success rate of 0 against N is to show that there does not exist a backdoor 
attack on N which has a success rate of at least 0. 


3 Verifying Backdoor Absence 


3.1 Overall Algorithm 


The overall approach is shown in Algorithm 1. The inputs include the network NV, the 
required success rate 0, a parameter K representing the sampling size, the trigger shape 
(cs, hs, ws), the target label ts, as well as multiple parameters for hypothesis testing 
(i.e., a type I error a, a type II error (, and a half-width of the indifference region ô). 
The idea is to apply hypothesis testing, i.e., the SPRT algorithm [1], with the following 
two mutually exclusive hypotheses. 


— Ho: The probability of not having an attack on a set of K randomly selected images 
is more than 1 — 0%, 

— H: The probability of not having an attack on a set of K randomly selected images 
is no more than 1 — 0%, 


In the algorithm, variable n and z record the number of times a set of K random 
images is sampled and is shown to be free of a backdoor with a 100% success rate 
respectively. Note that function veri fyX returns SAFE only if there is no backdoor 
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Algorithm 1: veri fyPr(N, 0, K, (cs, hs, Ws), ts, @, B, ô) 


1 letn < 0 be the number of times veri fyX is called; 

2 letz — 0 be the number of times veri fyX returns SAFE; 
3 let po — (1 — 0") + ô, pı — (1 — 0") — 6; 

4 while true do 

5 n n+l; 

6 randomly select a set of images X with size K; 

7 if verifyX (N, X, (cs, hs, Ws), ts) returns SAFE then 
8 


| z=z+ 1; 

9 else if verify X(N, X, (cs, hs, Ws), ts) returns UNSAFE then 
10 if the generated trigger satisfies the success rate then 
1 |. return UNSAFE; 
12 if PÍ x Goel" < Ê then 

põ (1-po)"—*% = 1l-a 
13 return SAFE; // Accept Ho 
14 else if ZÍ x Gee" > 1-4 then 

PG (1—po) a 

15 return UNKNOWN; // Accept Hı 


attack on a set of given images X with 100% success rate, i.e., L(N(Is)) = ts for all 
I € X. It may also return a concrete trigger which successfully attacks every image in 
X. The details of algorithm veri fyX is presented in Sect. 3.2. 

The loop from lines 4 to 15 in Algorithm 1 keeps randomly selecting and verifying 
a set of K images using algorithm veri fyX until one of the two hypotheses is accepted 
according to the criteria set by the parameters a and 8 based on the SPRT algorithm. 
Furthermore, whenever a trigger is returned by algorithm veri fyX at line 9, we check 
whether the trigger reaches the required success rate on the test set, and return UNSAFE 
if it does. Note that when Ho is accepted, we return SAFE, i.e., we successfully verify 
the absence of a backdoor attack with a success rate of at least 0. When H; is accepted, 
we return UNKNOWN. 

Apart from the success rate 0 and parameters for hypothesis testing, Algorithm 1 
has a particularly interesting parameter K, i.e., the number of images to draw at random 
each time. On the one hand, if is set to be small, such as 1, it is very likely algorithm 
veri fyX invoked at line 9 will return UNSAFE since it is often possible to attack a 
small set of images as demonstrated by many adversarial attack methods [4, 11,24], 
i.e., changing a few pixels of an image changes the output of a neural network. As a 
result, hypothesis H; is accepted and nothing can be concluded. On the other hand, if 
K is set to be large, such as 10000, due to the complexity of algorithm veri fyX (see 
Sect. 3.2), it is likely that it will timeout and thus return UNKNOWN, which leads to 
inclusion as well. Furthermore, when K is large, 1 — 6* will be close to 1 and, as a 
result, many rounds are needed to accept Ho even if algorithm veri fyX returns SAFE. 
It is thus important to find an effective K value to balance the two aspects. We identify 
the value of K empirically in Sect. 4 and aim to study the problem in the future. 

Take as an example the network shown in Fig. 1 which is a feed-forward neural 
network built with the ReLU activation function and three hidden layers. We aim to 
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verify the absence of a backdoor attack with a success rate of 0.9. We take 10000 images 
of the MNIST test set to evaluate the success rate of a trigger. We set the parameters in 
Algorithm 1 as follows: K = 5 anda = 8 = 6 = 0.01. For the target label 0, after 95 
rounds, we have enough evidence to accept the hypothesis Ho, which means we have 
evidence that there is no backdoor attack on the network with the target label 0 and 
a success rate of at least 0.9. We have similar results for other target labels, although 
more rounds of tests are required for labels 2, 3, 5, and 8 (i.e., 98 rounds for label 8, 
100 rounds for label 3, 117 rounds for label 5, and 188 rounds for label 2). 


3.2 Verifying Backdoor Absence Against a Set of Images 


Next, we present the details of algorithm veri fyX. The inputs include the neural net- 
work N, a set of images X with shape (c,h, w), a trigger shape (cs, hs, ws) and a 
target label ts. The goal is to check whether exists a trigger which successfully attacks 
every image in X. Algorithm veri fyX may have three outcomes. One is SAFE, i.e., 
there is no trigger such that backdoor attack succeeds on all the images in X. Another 
is UNSAFE, i.e., a trigger that can be used to successfully attack all images in X is 
generated. The last one is UNKNOWN, i.e., we fail to establish either of the above 
results. 

In the following, we describe one concrete realization of the algorithm based on 
abstract interpretation, as shown in Algorithm 2. At line 1, variable hasUnknown is 
declared as a flag which is true if and only if we cannot conclude whether there is a 
successful attack at a certain position. The loop from lines 2 to 15 tries every position 
for the trigger one by one. Intuitively, variable ¢ is the constraint that must be satisfied 
by a trigger to successfully attack every image in X. At line 3, we initialize ¢ to be 
Ọpre» Which is defined as follows: pre = NjeP(hp,wp) lw; < xj < up; where j € 
P(hp, Wp) denotes that j is an index (of an image pixel) in the trigger, x; is a variable 
denoting the value of the j-th pixel, lw; and up; are the (normalized) minimum (e.g., 
0) and maximum (e.g., 1) value of feature j in the image according to the input domain 
specified by the network NV. Intuitively, pre requires that the pixels in the trigger must 
be within its domain. 

Given a position, the loop from lines 4 to 10 constructs one constraint y for each 
image J, which is the constraint that must be satisfied by the trigger to attack J. In 
particular, at line 5, function attackCondition is called to construct the constraint. 
We present the details of this function in Sect. 3.3. If py is UNSAT (line 6), attacking 
image I at position (hp, wp) is impossible and we set ¢ to be false and break the loop. 
Otherwise, we conjunct ¢ with @;. 

After collecting one constraint from each image, we solve ¢ using a constraint 
solver. If it is not UNSAT (i.e., SAT or UNKNOWN), function opT rigger is called 
to generate a trigger which is successful on all images in X (if possible). Note that 
due to over-approximation, the model returned by the solver might be spurious. The 
details of function opT rigger is presented in Sect. 3.4. If a trigger is successfully gen- 
erated, we return UNSAFE (at line 13, together with the trigger); otherwise, we set 
hasUnknown to be true and continue with the next trigger position. Note that we can 
return UNKNOWN at line 15 without missing any opportunity for verifying the back- 
door absence. We instead continue with the next trigger location hoping a trigger may 
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Algorithm 2: verifyX (N,X,(cs,hs, Ws), ts) 


1 let hasUnknown — false; 

2 foreach trigger position (hp, wp) do 

3 let $ — pre} 

4 foreach image I € X do 

5 let dr — attackCondition(N, I, dpre, (Cs, hs, Ws), (Rp, Wp), ts); 
6 if dr is UNSAT then 

7 ob < false; 

8 break; 

9 else 

10 p — NA or; 

1 if solving ¢ results in SAT or UNKNOWN then 

12 if opTrigger(N, X, ¢, (cs, hs, ws), (hp, Wp), ts) returns a trigger then 
13 return UNSAFE; 

14 else 

15 hasUnknown + true; 


6 return has Unknown ? UNKNOWN : SAFE; 


= 


be generated successfully. After analyzing all trigger positions (and not finding a suc- 
cessful trigger), if has Unknown is true, we return UNKNOWN or otherwise SAFE. 


3.3 Abstract Interpretation 


Function attackC'ondition returns a constraint that must be satisfied such that the trig- 
ger with shape (cs, hs, ws) is successful on the image J at position (hp, wp). In this 
work, for efficiency reasons, it is built based on abstract interpretation techniques [32]. 
Multiple abstract domains have been proposed to analyze neural networks, such as 
interval [41], Zonotope [30], and DeepPoly [32]. In this work, we adopt the DeepPoly 
abstract domain [32], which is shown to balance between precision and efficiency. 

In the following, we assume each hidden layer in the network is expanded into two 
separable layers, one for the affine transformation and the other for the activation func- 
tion transformation. We use / to denote the number of layers in the expanded network, 
n; to denote the number of neurons in layer 7, and a} j to denote the variable repre- 
senting the j-th neuron in layer i for the image J. The constraint ¢; to be returned by 
function attack(N,I, pre, (Cs, hs, Ws), (Rp, Wp), ts) is a conjunction of three parts. 


oy = prer \ Az A postr 


where pre, is the constraint on the input features according to the image J, i.e., prey = 
E I — Tj; . 

pre ^ (ee Toj = 2 | ^ ETT Toj = Il) where j Z P(hp, wp) 

means that j is not an index (of a pixel) of the trigger; Tå, ; 1s the variable that represents 


the input feature j (a.k.a. neuron j at the input layer) of the image J and I[j] is the 
(normalized) pixel value in the image at index j. Intuitively, the constraint pre; “erases” 
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Fig. 3. An example of abstract interpretation 


the pixels in the trigger, i.e., they can now take any value with their range, while the 
remaining pixels must have those value from the image. post; represents the condition 
for a successful attack. That is, the value of the target label (i.e., ERR must be greater 
than the values of any other label, i.e., postr = Niece Tiit > riag 

More interestingly, Az is a constraint that over-approximates the behavior of the 
neural network N according to the DeepPoly abstract domain. That is, given the con- 
straint on the input layer prey, a set of abstract transformers are applied to compute a 
linear over-approximation of every neuron in the next layer, every neuron in the layer 
after that, and so on until the output layer. The constraint computed on each neuron x/ ig 
is of the form gel; . < a < lel ig ^ lwl < al < up) j where gel; , and le! , j are two 
linear expressions ‘consntuted by variables representing neurons fom. the ee layer 
(i.e., layer i — 1); and Iw} i,j and upi ; are the concrete lower bound and upper bound of 
the neuron. Note that the abstract iransformers are different for the activation function 
layer and affine layer. As the DeepPoly abstract transformers are not our contribution, 
we skip the details and refer the reader to [32] for details on the abstract transformers, 
including their soundness (i.e., they always over-approximate). 


Example I. Since it is too complicated to show the details of applying abstract inter- 
pretation to the neural network shown in Fig. 1, we instead construct a simple example 
as shown in Fig.3 to illustrate how it works. There are two features in this artificial 
image J, i.e. , Tò, , has a constant value of 0.5 and Lm o is the trigger whose value ranges 
from 0 to 1. That is, prer = 0 < xf. o sla Tò, 1 = 0.5. After expanding the hidden 
layers, the network has 6 layers, each of which has 2 neurons. Applying the DeepPoly 
abstract transformers from the input layer all the way to the output layer, we obtain the 
abstract states for the last layer. Further, assume that the target label is 0. The constraint 
post, is thus as follows: post; = To > T$ 1. Solving the constraints returns SAT with 
xå o = 0. Indeed, with the stamped image J, = [0, 0.5], the output vector is [1,0]. We 
thus identified a successful attack on the target label 0. 


Optimization. Note that at line 6 of Algorithm 2, for each constraint z, we perform a 
quick check to see if the constraint is satisfiable or not. If z is UNSAT, we can ignore 
the remaining images and analyze the next trigger position, which allows us to speed up 
the process. One naive approach is to call a solver on z, which would incur significant 
overhead since it could happen many times. To reduce the overhead, we propose a 
simple procedure to quickly check whether ¢; is UNSAT based solely on its abstract 
states at the output layer. That is, we check the satisfiability of the following constraint 
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instead: No<j<niinjżts uplat, > lwl j- Recall that uplat, is the concrete upper 
bound of the neuron ts, and lwia, j is the concrete lower bound of the neuron 7 at the 
output layer. Thus, intuitively, we check whether the concrete upper bound of the target 
label t, is larger than the concrete lower bound of every other label. If it is UNSAT, 
it is impossible to have the target label as the result and thus the attack would fail on 
the image J. We then only call the solver on ¢; if the above procedure does not return 
UNSAT. Furthermore, the loop in Algorithm 2 can be parallelized straightforwardly, 
i.e., by using a separate process to verify against a different trigger position. Whenever 
a trigger is found by any of the processes, the whole algorithm is then interrupted. 


3.4 Generating Backdoor Triggers 


In the following, we present the details of function opT'rigger, which intuitively aims 
to generate a trigger S with shape (cs, hs, ws) at position (hp, wp) for attacking every 
image J in X successfully. If the solver applied to solve ¢ at line 11 of Algorithm 2 
returns a model that satisfies @, we first check whether the model is indeed a trigger that 
successfully attacks every image in X. Due to over-approximation of abstract interpre- 
tation, the model might be a spurious trigger. If it is a real trigger, we return the model. 
Otherwise, we employ an optimization-based approach to generate a trigger. 

Given a network NV, one image J, a target label ts, and a position (hp, Wp), let I, 
is the stamped image generated from J by stamping J with the trigger at the position 
(Rp, Wp). We generate a backdoor trigger S by minimizing the following loss function. 


0 ifn, > No 


loss(N,I,S, (hp, wp), ts) = { (Mo — ns + €) otherwise 


where n, = N(J,)[t;] is the output value of the target label; no = maxj;z,, N(Is)[j] 
is the maximum value of any label other than the target label; and € is a small constant 
(e.g., 10~°). Note that the trigger S is the only variable in the loss function. Intuitively, 
the loss function returns 0 if the attack on J by the trigger is successful. Otherwise, it 
returns a quantitative measure on how far the attack is from being successful on attack- 
ing J. Given a set of images X, the loss function is defined as the sum of the loss for 
each image I in X: loss(N,X,S,(hp, Wp), ts) = Do rex loss(N, I, S, (hp, wp), ts). 
The following optimization problem is then solved to find an attack which successfully 
attacks all images in X: arg ming loss(N, X, S, (hp, Wp), ts). 


3.5 Correctness and Complexity 


Lemma 1. Given a neural network N, a set of images X, a trigger shape (cs, hs, Ws), 
and a target label ts, Algorithm 2 (1) returns SAFE only if there is no backdoor attack 
which is successful on all images in X with the provided trigger shape and target label; 
and (2) returns UNSAFE only if there exists a backdoor attack which is successful on 
all images in X with the provided trigger shape and target label. 


Proof. By [32], function attackCondition always returns a constraint which is an 
over-approximation of the constraint that must be satisfied such that the trigger is suc- 
cessful on image J. Furthermore, Algorithm 2 returns SAFE only at line 16, i.e., only 
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if constraints that must be satisfied to attack all images in X at each certain position 
are UNSAT. Thus, (1) is established. (2) is trivially established since we only return 
UNSAFE when a trigger that is successful on every provided image is generated. 


The following establishes the soundness of our approach. 


Theorem 1. Given a neural network N, a success rate 0, a target label ts, a trigger 
shape (cs, hs, Ws), a type I error a, a type II error 3, and a half-width of the indifference 
region ô, Algorithm 1 returns SAFE only if there is sufficient evidence (subject to type I 
error a and type II error (3) that there is no backdoor attack with a success rate at least 
0 with the provided trigger shape and target label at the specified significance level. 


Proof. If there is a backdoor attack with a success rate no less than 0, given a set of 
randomly K selected images, the probability of having an attack is no less than 6% 
(since there is at least one backdoor attack with a success rate no less than 0 and maybe 
more). Thus, the probability of not having an attack is no more than 1 — 6". By the 
correctness of the SPRT algorithm, Algorithm | returns SAFE only if there is sufficient 
evidence that Ho is true, i.e., the probability of not having an attack on a set of K 
randomly selected images is more than 1 — 0*, implying it is sufficient evidence that 
there is no backdoor attack with success rate no less than 8. The theorem holds. 


Furthermore, it is trivial to show that Algorithm 1 returns UNSAFE only if there 
exists a backdoor attack which has a success rate at least 0 with the provided trigger 
shape and target label. 

In the following, we briefly discuss the complexity of our approach. It is straightfor- 
ward to see that Algorithm 2 always terminates if a timeout is imposed on solving the 
constraints and the optimization problems. Since we can always set a tight time limit on 
solving the constraints and the optimization problems, the complexity of the algorithm 
is determined mainly by the complexity of function attackCondition, which in turn 
is determined by the complexity of abstract interpretation. The complexity of applying 
abstract interpretation with the DeepPoly abstract domain is O(1? x n3,,.,.) where l is 
the number of layers, and n,,, is the maximum number of neurons in any of the lay- 
ers. Let K be the number of images in X. Note that the number of trigger positions 
is O(h x w), i.e., the size of an image. The best case complexity of Algorithm 2 is 
O(I? x n3 az X h x w) and the worst case complexity is O(I? x n3,,, x K x h x w). 
We remark that in practice, l typically ranges from 1 to 20; Nmax is often advised to be 
no more than the input size (e.g., from dozens to thousands usually); K ranges from a 
few to hundreds; and h x w depends on the image resolution (e.g., from hundreds to 
millions). Thus, in general, Algorithm 2 could be time-consuming in practice and we 
anticipate further optimization in future work. 

The complexity of Algorithm 1 is the complexity of Algorithm 2 times the complex- 
ity of the SPRT algorithm. The complexity of the SPRT algorithm is in general hard to 
quantify and we refer the readers to [1] for a detailed discussion. 


3.6 Discussion 


Our approaches are designed to verify the absence of input-agnostic (i.e., not input- 
specific) backdoor attacks as presented in Sect. 2. In the following, we briefly review 
other backdoor attacks and discuss how to extend our approach to support them. 
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In [12], Gu et al. described a backdoor attack which, instead of forcing the network 
to classify any stamped image with the target label, only alters the label if the original 
image has a specific ground truth label t; (e.g., Bob with the trigger will activate the 
backdoor and be classified as Alice the manager). Our verification approach can be 
easily adapted to verify the absence of this attack by focusing on images with label t; 
in Algorithm | and Algorithm 2. 

Another attack proposed in [12] works by reducing the performance (e.g., accuracy) 
of the neural network on the images with a specific ground truth label t;, i.e., given an 
image with ground truth label ¢;, the network will classify the stamped image with some 
label ts A t;. The attack can be similarly handled by focusing on images with ground 
truth label t;, although due to the disjunction introduced by t, Æ t;, the constraints are 
likely to be harder to solve. That is, we can focus on images with ground truth label t; 
in Algorithm 2, and define an attack to be successful if L(N(1,)) 4 t; is satisfied. 

In [19], Liu et al. proposed to use backdoor triggers with different shapes (i.e., not 
just in the form of a square or a rectangle). If the user is aware of the shape of the back- 
door trigger, a different trigger can be used as input for Algorithm | and Algorithm 2 
and the algorithms would work to verify the absence of such backdoor. Alternatively, 
the users can choose a square-shaped backdoor trigger that is larger enough to cover 
the actual backdoor trigger, in which case our algorithms would remain to be sound, 
although it might be inconclusive if the trigger is too big. 

Multiple groups [2, 20,28,35] proposed the idea of poisoning only those samples in 
the training data which have the same ground truth label as the target label to improve 
the stealthiness of the backdoor attack. This type of attack is designed to trick the human 
inspection on the training data, and so does not affect our verification algorithms. 

In this work, we consider a specific type of stamping, i.e., the backdoor trigger 
replaces the part of the original clean image. Multiple groups [6, 19] proposed the use 
of the blending operation as a way of ‘stamping’, i.e., the features of the backdoor 
trigger are blended with the features of the original images with some coefficients a. 
This is a form of input-specific backdoor, the trigger is different for different images. 
To handle such kind of backdoor attacks, one way is to modify the constraint pre; 
according to the blending operation (assuming that a is known). Since the blending 
operation proposed in [6,19] is linear, we expect this would not introduce additional 
complexity to our algorithms. 

Input-specific triggers, in general, may pose a threat to our approach. First, some 
input-specific triggers [19,20] cover the whole image, which is likely to make our app- 
roach inclusive due to false alarms resulted from over-approximation. Second, it may 
not be easy to model some of the input-specific triggers in our framework. For instance, 
Liu et al. [20] recently proposed to use reflection to create stamped images that look nat- 
ural. Modeling the ‘stamping’ operation for this kind of attack would require us to know 
where the reflection is in the image, which is highly non-trivial. However, it should also 
be noted that input-specific triggers are often not as effective as input-agnostic triggers, 
e.g., the reflection-based attack reported in [20] are hard to reproduce. Furthermore, as 
discussed in Sect. 2, backdoor attack with input-specific triggers is an attacking method 
that is more powerful than adversarial attacks, and the problem of verifying the absence 
of backdoor attack with input-specific triggers is not yet clearly defined. 
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4 Implementation and Evaluation 


We have implemented our approach as a self-contained analysis engine in the Socrates 
framework [26]. We use Gurobi [13] to solve the constraints and use scipy [36] to solve 
the optimization problems. 

We collect a set of 51 neural networks. 45 of them are fully connected ones and 
are trained on the MNIST training set (1.e., a standard dataset which contains black and 
white images of digits). These networks have the number of hidden layers ranging from 
3 to 5. For each network, the number of neurons in each of its hidden layers ranges from 
10 to 50, i.e., 10, 20, 30, 40, or 50. To evaluate our approach on neural networks built 
with different activation functions, each activation function (i.e., ReLU, Sigmoid, and 
Tanh) is used in 15 of the neural networks. Among the remaining six networks, three 
of them are bigger fully connected networks adopted from the benchmarks reported 
in [32]. They are all built with the ReLU activation function. For convenience, we name 
the networks in the form of f-k_n where f is the name of the activation function, k 
is the number of hidden layers, and n is the number of neurons in each hidden layer. 
The remaining three networks are convolutional networks (which are often used in face 
recognition systems) adopted from [32]. Although they have the same structure, i.e., 
each of them has two convolutional hidden layers and one fully connected hidden layer, 
they are trained differently. One is trained in the normal way; one is trained using Dif- 
fAI [22], and the last one is trained using projected gradient descent [7]. These training 
methods are developed to improve the robustness of neural networks against adversarial 
attacks. Our aim is thus to evaluate whether they help to prevent backdoor attacks as 
well. We name these networks conv, conv_diffai, and conv_pgd. 

We verify the networks against the backdoor trigger with shape (1,3, 3). All the net- 
works are trained using clean data since we focus on verifying the absence of backdoor 
attacks. They all have precision of at least 90%, except Sigmoid_4_10 and Sigmoid_5_10, 
which have precision of 81% and 89% respectively. In the following, we answer multi- 
ple research questions. All the experiments are conducted using a machine with 3.1Ghz 
16-core CPU and 64GB RAM. All models and experiment details are at [27]. 


RQI: Is our realization of verifyX effective? This question is meaningful as our app- 
roach relies on Algorithm veri fyX. To answer this question, for each network, we 
select the first 100 images in the test set (i.e., a K of 100 for Algorithm 1, which is 
more than sufficient) and then apply Algorithm veri fyX with these images and each 
of the labels, i.e., O to 9. In total, we have 510 verification tasks. For each network, we 
run 10 processes in parallel, each of which verifies a separate target. The only exception 
is the network ReLU_3_1024, due to its complexity, we only run five parallel processes 
(since each process consumes a lot of resources). In each verification process, we filter 
out those images which are classified wrongly by the network as well as the images 
which are already classified as the target label. 

Figure 4 shows the results. The x-axis show the groups of the networks, e.g., ReLU_3 
means five fully connected networks using the ReLU activation function with three hid- 
den layers; 3 Full and 3 Conv mean the three fully connected and the three convolutional 
networks adapted from [32] respectively. The y-axis shows the number of (network, 
target) pairs. Note that each group may contain a different number of pairs, i.e., the 
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Fig. 4. The results of veri fy X 


maximum values for the small network groups are 50, and the maximum values for the 
last two groups are 30. First, we successfully verify 455 out of 510 verification tasks 
(i.e., 89%) of them, i.e., the neural network is safe with respect to the selected images. 
It is encouraging to notice that the verified tasks include all models adopted from [32], 
which are considerably larger (e.g., with 1024 neurons at each layer) and more complex 
(i.e., convolutional networks). Second, some networks are not proved to be safe with 
some target labels. It could be either there is indeed a backdoor trigger that we fail to 
identify (through optimization), or we fail to verify due to the over-approximation intro- 
duced by abstract interpretation. Lastly, with the same structure (i.e., the same number 
of hidden layers and the same number of neurons in each hidden layer), the networks 
using the ReLU and Sigmoid activation functions are more often verified to be safe than 
those using the Tanh activation function. This is most likely due to the difference in the 
precision of the abstract transformers for these functions. 


RQ2: can we verify the absence of backdoor attacks with a certain level of success 
rate? To answer this question, we evaluate our approach on six networks used in RQI, 
i.e., ReLU_3_10, ReLU_5_5S0, Sigmoid_3_10, Sigmoid_5_50, Tanh_3_10, and Tanh_5_50. 
These networks are chosen to cover a wide range of the number of hidden layers and 
the number of neurons in each layer, as well as different activation functions. Note 
that due to the high complexity of Algorithm | (which potentially applies Algorithm 2 
hundreds of times), running Algorithm | on all the networks evaluated in RQ1 requires 
an overwhelming amount of resources. Furthermore, since there is no existing work on 
backdoor verification, we do not have any baseline to compare with. 

Recall that Algorithm 1 has two important parameters K and 6, both of which poten- 
tially have a significant impact on the verification result. We thus run each network with 
four different settings, in which the number of images K is set to be either 5 or 10, and 
the success rate 0 is either 0.8 or 0.9. In total, with 10 target labels, we have a total of 
240 verification tasks for this experiment. Note that some preliminary experiments are 
conducted before we select these two K values. 
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Fig. 5. Verification results 


We use all the 10000 images in the test set as the image population and randomly 
choose K images in each round of test. When a trigger is generated, the success rate of 
the trigger is validated on the images in the test set (after the above-mentioned filtering). 
Like in RQ1, we run each network with 10 parallel processes, each of which verifies a 
separate target. As the SPRT algorithm may take a very long time to terminate, we set a 
timeout for each verification task, i.e., 2h for those networks with three hidden layers, 
and 10h for those networks with five hidden layers. 

The results are shown in Fig. 5. The x-axis shows the networks, the y-axis shows 
the number of verified pairs of network and target label. We have multiple observations 
based on the experiment results. First, a quick glance shows that with the same struc- 
ture and hypothesis testing parameters, more networks built with the ReLU activation 
function are verified than those built with the Sigmoid and Tanh functions. Second, we 
notice that the best result is achieved with K = 5 and 0 = 0.9. With these parameter 
values, we can verify that three networks ReLU_3_10, ReLU_5_50, and Sigmoid_3_10 
are safe with respect to all the target labels and the network Sigmoid_5_50 is safe with 
respect to nine over 10 target labels. If we keep the same success rate as 0.9 and increase 
the number of images K from 5 to 10, we can see that the number of verified cases in 
the network Sigmoid_5_50 decreases. This is because when we increase the number of 
images that must be attacked successfully together, the probability that we do not have 
the attack increases, which means we need more rounds of test to confirm the hypoth- 
esis Ho and so the verification process for the network Sigmoid_5_50 times out before 
reaching the conclusion. We have a similar observation when we keep the number of 
images K at 5 but decrease the success rate from 0.9 to 0.8. When the success rate 
decreases, the probability of not having the attack increases, which requires more tests 
to confirm the hypothesis Ho. As a result, for all these four networks, there are multiple 
verification tasks that time out before reaching the conclusion. However, we notice that 
there is an exception when we keep the success rate as 0.8 and increase the number of 
images from 5 to 10. While the number of verified cases for the network ReLU_5_50 
decreases (which can be explained in the same way as above), the number of veri- 
fied cases for the network Sigmoid_3_10 increases (and the results for the other two 
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Fig. 6. The running time of the experiments in RQ1 with benchmark networks 


networks do not change). Our explanation is that when we increase the number of 
images K to 10, it is easier for the Algorithm 2 to conclude that there is no attack, 
and so the Algorithm 1 still collects enough evidence to conclude Hp. On the other 
hand, with the number of images is 5, Algorithm 2 may return a lot of UNKNOWN 
(due to spurious triggers), and so the hypothesis testing in the Algorithm 1 goes back 
and forth between the two hypotheses Ho and H; and eventually times out. 

A slightly surprising result is obtained for the network Tanh_3_10, i.e., our trigger 
generation process generates two triggers for the target labels 2 and 5 when the success 
rate is set to be 0.8. This is surprising as these networks are not generated with back- 
door attack. This result can be potentially explained by the combination of the relatively 
low success rate (i.e., 0.8) and the phenomenon known as universal adversarial pertur- 
bations [23]. With the returned triggers, the users may want to investigate the network 
further and potentially improve it with techniques such as robust training [7,22]. 


RQ3: Is our approach efficient time-wise? To answer this question, we collect the wall- 
clock time to run the experiments in RQ1 and RQ2. For each network, we record the 
average running time for 10 different target labels. The results for 45 small networks 
are shown in Fig.6. The x-axis shows the groups of 15 networks categorized based 
on their activation functions and the y-axis shows the logarithmic scale of the running 
time in the form of boxplots (where the box shows the result of 25 percentile to 75 
percentile, the bottom and top lines are the minimum and maximum, and the orange 
line is median). The execution time ranges from 14s to less than 6h for these networks. 
Furthermore, we can see that there is not much difference between the running time 
of the networks using the ReLU and Sigmoid activation functions. However, the run- 
ning time of the networks using the Tanh function is one order of magnitude larger than 
those of the ReLU and Sigmoid networks. The reason is that the Tanh networks have 
many non-safe cases (as shown in Fig. 4) and, as a result, the verification process needs 
to check more images at more trigger positions. The running time of those networks 
adopted from [32] ranges from more than 5 min to less than 4h, as shown in Table 1. 
Finally, the running time for each network in RQ2 (i.e., the time required to verify the 
networks against backdoor attacks) according to different settings is shown in Table 2. 
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Table 1. The running time of the experiments in RQ1 with networks adapted from [32] 


Network Network 
ReLU_3_1024 | 237 m 24s | conv 194 m 30s 
ReLU_5_100 |5m38s_ | conv_diffai|111m12s 


ReLU_8_200 48m 34s |conv_pgd |190m19s 


Table 2. The running time of the experiments in RQ2 


Network K=5 K=10 K=5 K=10 
0=0.9 0=0.9 0=0.8 0 =0.8 
ReLU_3_10 31m3ls 46m39s |55m44s | 68m54s 
ReLU_5-50 |341 m 36s | 493 m 30s | 551 m 40s | 600 m O s 
Sigmoid-3_10 | 46 m 43s 59m28s |92m34s |93m21s 
Sigmoid_5_50 | 476 m 38 s 588m25s|600m0s |600m0s 
Tanh_3_10 1144m2s 105m18s|50m58s |26m4s 
Tanh_5_50 600m0s 600m0s |600m0s |600m0s 


RQ4: can our approach generate backdoor triggers? Being able to generate counterex- 
amples is a part of a useful verification method. We conduct another experiment to 
evaluate the effectiveness of our backdoor trigger generation approach. We train a new 
set of 45 networks that have the same structure as those used for answering RQ1. The 
difference is that this time each network is trained to contain backdoor through data 
poisoning. In particular, for each network, we randomly extract 20% of the training 
data, stamp a white square with shape (1,3,3) in one corner of the images, assign a 
random target label, and then train the neural network from scratch with the poisoned 
training data. While such an attack is shown to be effective [12], it is not guaranteed 
to be always successful on a randomly selected set of images. Thus, we do the follow- 
ing to make sure that there exists a trigger for a set of selected images. From 10000 
images in the test set, we first filter out those images which are classified wrongly or 
already classified with the target label. The remaining images are collected into a set 
Xo. Next, to make sure that the selected images have a high chance of being attacked 
successfully, we apply another filter on Xo. This time, we stamp each image in Xo with 
a white square at the same trigger position as we poison the training data. We then keep 
the image if its stamped version is classified by the network with the target label. The 
remaining images after the second filter are collected into another set X. We apply our 
approach, in particular, the backdoor trigger generation on X, if |X| + |Xo| > 0.8, i.e., 
the backdoor attack has a success rate of 80%. 

The results are shown in Fig. 7 in which the y-axis shows the number of networks. 
The timeout is set to be 120s. Among the 45 networks, we can see that a trigger is 
successfully generated for 33 (i.e., 73%) of the networks. A close investigation of these 
networks shows that the generated trigger is the exact white square that is used to stamp 
the training data. There are 12 networks for which the trigger is not generated. We 
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Fig. 7. The results of backdoor trigger generation 


investigate these networks and see that they are either too biased (i.e., classifying every 
image with the target label and thus |Xo| = 0) or the attack on these networks does 
not perform well (i.e., |X| + |Xo| < 0.8). In other words, the backdoor attack on 
these networks failed and, as a result, the generation process does not even begin with 
these networks. In a nutshell, we successfully generate the trigger for every successful 
backdoor attack. Finally, note that the running time of the backdoor generation process 
is reasonable (i.e., on average, 50 s to generate a backdoor trigger for one network) and 
thus it does not affect the overall performance of our verification algorithm. 


5 Related Work 


The work which is closest to ours is [37] in which Wang et al. aim to certify neural 
networks’ robustness against backdoor attack using randomized smoothing. However, 
there are many noticeable differences between their approach and ours. First, while our 
work focuses on verifying the absence of backdoor, their work aims to certify the robust- 
ness of individual images based on the provided training data and learning algorithm 
(which can be used to implicitly derive the network). Second, by using random noises 
to estimate the networks’ behaviors, their approach can only obtain very loose results. 
As shown in their experiments, they can only certify the robustness against backdoor 
attack with triggers contains two pixels and on a “toy” network with only two layers 
and two labels, after simplifying the input features by rounding them into 0 or 1. Com- 
pare to their approach, our approach can apply to networks used to solve real image 
classification problems as shown in our experiments. 

Our work is closely related to a line of work on verifying neural networks. Existing 
approaches mostly focus on local robustness property and can be roughly classified into 
two categories: exact methods and approximation methods. The exact methods aim to 
model the networks precisely and solve the verification problem using techniques such 
as mixed-integer linear programming [34] or SMT solving [8,15]. On the one hand, 
these approaches can guarantee sound and complete results in verifying neural net- 
works. On the other hand, they often have limited scalability and thus are limited to 
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small neural networks. Moreover, these approaches have difficulty in handling activa- 
tion functions except the ReLU function. 

In comparison, the approximation approaches over-approximate neural network 
behavior to gain better scalability. AI? [10] is the first work pursuing this direction using 
the classic abstract interpretation technique. After that, more researchers try to explore 
different abstract domains for better precision without sacrificing too much scalabil- 
ity [29,30,32]. In general, the approximation approaches are more scalable than the 
exact methods, and they are capable of handling activation functions such as Sigmoid 
and Tanh. However, due to the over-approximation, these methods may fail to verify a 
valid property. 

We also notice that it is possible to incorporate abstraction refinement to the approx- 
imation methods and gain better precision, for instance, by splitting an abstraction into 
multiple parts to reduce the imprecision due to over-approximation. There are many 
works [21,40,41] which fall into this category. We remark that our approach is orthog- 
onal to the development of sophisticated verification techniques for neural networks. 

Finally, our approach, especially the part on backdoor trigger generation, is related 
to many approaches on generating adversarial samples for neural networks. Some repre- 
sentative approaches in this category are FGSM [11], JSMA [24], and C&W [4] which 
aim to generate adversarial samples to violate the local robustness property, and [42] 
which aims to violate fairness property. 


6 Conclusion 


In this work, we propose the first approach to formally verify that a neural network is 
safe from backdoor attacks. We address the problem on how to verify the absence of a 
backdoor that reaches a certain level of success rate. Our approach is based on abstract 
interpretation and we provide an implementation based on DeepPoly abstract domain. 
The experiment results show the potential of our approach. In the future, we intend to 
extend our approach with more abstract domains as well as improve the performance 
to verify more real-life networks. Besides that, we also intend to apply our approach to 
verify the networks designed for other tasks, such as sound or text classification. 
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Abstract. Deep Reinforcement Learning (DRL) has demonstrated its 
strength in developing intelligent systems. These systems shall be for- 
mally guaranteed to be trustworthy when applied to safety-critical 
domains, which is typically achieved by formal verification performed 
after training. This train-then-verify process has two limits: (i) trained 
systems are difficult to formally verify due to their continuous and infinite 
state space and inexplicable AI components (7.e., deep neural networks), 
and (ii) the ex post facto detection of bugs increases both the time- and 
money-wise cost of training and deployment. In this paper, we propose 
a novel verification-in-the-loop training framework called TRAINIFY for 
developing safe DRL systems driven by counterexample-guided abstrac- 
tion and refinement. Specifically, TRAINIFY trains a DRL system on a 
finite set of coarsely abstracted but efficiently verifiable state spaces. 
When verification fails, we refine the abstraction based on returned coun- 
terexamples and train again on the finer abstract states. The process is 
iterated until all predefined properties are verified against the trained 
system. We demonstrate the effectiveness of our framework on six clas- 
sic control systems. The experimental results show that our framework 
yields more reliable DRL systems with provable guarantees without sac- 
rificing system performance such as cumulative reward and robustness 
than conventional DRL approaches. 


Keywords: Deep reinforcement learning - Model checking - CEGAR - 
ACTL 


1 Introduction 


Deep Reinforcement Learning (DRL) has shown its strength in developing intel- 
ligent systems for complex control tasks such as autonomous driving [37,40]. 
Verifiable safety and robustness guarantees are crucial to these safety-critical 
DRL systems before deploying [23,44]. A typical example is autonomous driv- 
ing, which is arguably still a long way off due to safety concerns [21,39]. Recently, 
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tremendous efforts have been made toward adapting existing and devising new 
formal methods for DRL systems in order to provide provable safety guarantees 
[18, 25,45, 46, 51]. 

Formally verifying DRL systems is still a challenging problem. The challenge 
arises from DRL systems’ three features. First, the state space of a DRL system 
is usually continuous and infinite [28]. Second, the behavior of a DRL system 
is non-linear and determined by high-order system dynamics [17]. Last but not 
least, the controllers, typically deep neural networks (DNN), are almost inexpli- 
cable because of their black-box development [20,52]. The three features make 
it unattainable to verify DRL systems using conventional formal methods, i.e., 
modeling them as state transition systems and verifying temporal properties 
using dedicated decision procedures [4]. Most existing approaches have to sim- 
plify the problem by abstraction or over-approximation techniques and restrict 
to specific properties such as safety or reachability [46]. 

Another common problem with most existing formal verification approaches 
to DRL systems is that they are applied after the training is concluded. These 
train-then-verify approaches have two limitations. First, verification results may 
be inconclusive due to abstraction or overestimation. The non-linearity of both 
system dynamics and deep neural networks makes it difficult to control the 
overestimation in a reasonable range, resulting in false positives in verification 
results [50]. Second, the ex post facto detection of bugs increases both the time- 
and money-wise cost of training and deployment. No evidence shows that the 
iterative training and verification help improve system reliability, as tuning the 
parameters in neural networks may cause an unpredictable impact on the prop- 
erties because of the inexplicability [24]. 

To address the challenges in training and verifying DRL systems, in this 
paper we propose a novel verification-in-the-loop framework for training safe 
and reliable DRL systems with verifiable guarantees. Provided that a set of 
properties are predefined for a target DRL system to develop, our framework 
trains the system and verifies it against the properties in every iteration. To 
overcome the verification challenges in DRL systems, for the first time, we pro- 
pose a novel approach in our framework to train the systems on a finite set of 
abstract states, based on the observation that approximate abstractions can still 
preserve near-optimal behavior [1]. These states are the abstractions of the actual 
states. Training on the finite abstract states allows us to model the Al-embedded 
systems as finite-state transition systems. We can leverage classic model check- 
ing techniques to verify their more complicated temporal properties than safety 
and reachability. 

As system performance may be affected by the abstraction granularity, 
we employ the idea of the counterexample-guided abstraction and refinement 
(CEGAR) [8] in model checking along the training process. We start with a 
coarsely abstracted but efficiently verifiable state space and train and verify 
DRL systems on the abstract state space. Once verification fails, we refine the 
abstract state space based on the returned counterexamples and retrain the sys- 
tem on the finer-grained refined state space. The process is repeated until all the 
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properties are verified successfully. We, therefore, call the training and verifica- 
tion framework CEGAR-driven, by which we can reach an appropriate abstrac- 
tion granularity that guarantees both system performance and verification scal- 
ability. 

Our verification-in-the-loop training framework has four advantages com- 
pared with conventional DRL training and verification approaches. Firstly, our 
approach produces correct-by-construction DRL systems that are verifiably safe 
with respect to user-defined safety requirements. Secondly, more complicated 
properties such as safety and liveness can be verified thanks to the dedicated 
training approach on abstracted state space. Another advantage of the training 
approach is that it is orthogonal to state-of-the-art DRL algorithms such as Deep 
Q-Network (DQN) [34] and Deep Deterministic Policy Gradient (DDPG) [32]. 
Thirdly, our approach provides a flexible mechanism for fine-tuning an appro- 
priate abstraction granularity to balance system performance and verification 
scalability. Lastly, training on abstract states renders DRL systems to be more 
robust against adversarial and environmental perturbations because small per- 
turbation to an actual state may not alter the decision of the neural network on 
the same abstract state. 

We implement a prototype tool called TRAINIFY (abbreviated for Train and 
Verify, available at https://github.com/aptx4869tjx/RL_verification). We per- 
form extensive experiments on six classic control tasks in public benchmarks to 
evaluate the effectiveness of our framework. For each task, we train two DRL 
systems under the same settings in our approach and corresponding conven- 
tional DRL algorithm, respectively. We compare the two systems in terms of the 
properties that they shall satisfy and the performance in terms of cumulative 
reward and robustness. Experimental results show that the systems trained in 
our approach are more efficient to verify and more reliable than those trained in 
conventional methods; moreover, their performance is competitive and higher. 

In summary, this paper makes the following three major contributions: 


1. A novel verification-in-the-loop training framework for developing verifiable 
and reliable DRL systems with correct-by-construction guarantees. 

2. A CEGAR-driven approach for fine-tuning abstraction granularity during 
training to reach a balance between system performance and verification scal- 
ability. 

3. A resulting prototype tool called TRAINIFY for training and verifying DRL 
systems and a thorough evaluation of the proposed approach on public bench- 
marks. 


Paper Organization. Section2 briefly introduces deep reinforcement learn- 
ing. Section3 presents the model-checking problem of DRL systems. Section 4 
presents our training and verification framework. Section 5 shows six case stud- 
ies and experimental results. Section 6 mentions some related work, and Sect. 7 
concludes the paper. 
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2 Deep Reinforcement Learning (DRL) 


DRL is a technique for learning optimal control policies using deep neural net- 
works according to evaluative feedback [31]. An agent in a DRL system interacts 
with the environment and records its state s; at each time step t. It feeds s; into 
a deep neural network to compute an action a; and transitions to the next state 
5441 according to a; and the system dynamics. The system dynamics describe 
the non-linear behavior of the agent over time. The agent receives a scalar reward 
according to reward functions. Some algorithms estimate the distance between 
the action determined by the network and the expected action in the same state. 
Then, it updates the parameters in the network according to the estimated dis- 
tance to maximize the cumulative reward. 


A Running Example. Action a, 


Figure 1 shows a classic —0.05 

DRL task of learning a l Deep Neural Network 
control policy to drive a Update'_ _ 

car to the right hilltop. aches > 

The car is initially posi- (0.9, —0.04) ————> 

: state space: (p, v) 

tioned on a track between p € [-1.2,0.6] 

two mountains. The track v € [-0.07, 0.07] 


Decision making 


is one-dimensional, and 
thus the car’s position is 
represented as a real num- 
ber. Velocity is another dimension in the car’s state and is represented as a real 
number too. Thus, the car’s state is a pair (p, v) of position p and velocity v. An 
action a is a real number representing the force imposed on the car. The action 
is computed by a neural network on both p and v. 

The sign of a means the direction of the force, i.e., positive for the right and 
negative for the left, respectively. Given a state s; = (p, v+) and an action a, at 
time step t, the system transitions to the next step 5441 = (pr+1, Vt+1) following 
the given dynamics: 


Fig. 1. A DRL example of mountain car system. 


P41 = Pt + VAt, (1) 
Vtp1 = Vt + (at — Me X g X Cos(3pz)) At, (2) 


where me denotes the car’s mass, g denotes the gravity, and A; is the unit 
interval between two consecutive steps. In DRL, time is usually discretized to 
facilitate implementation. The car is assumed to move in uniform motion during 
a unit interval. 


Reward Setting. The reward function R maps state s+, action a; and state 
St41 to a real number, which represents the rewarded value by applying a; to 
5, to transition to s,,;. The purpose of R is to guide the agent to achieve the 
preset goals by making cumulative reward as great as possible. The definition of 
R is based on prior knowledge or expert experience before training. 

In the Mountain Car example, the controller receives the reward which is 
defined as R( (pz, vz), at, (pi+1, Vt+1)) = —1.0 at each time step when p+ı < 0.45. 
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The reward is a negative constant because the goal in this example is to force 
the car to reach the right hilltop (p = 0.45) as quickly as possible. If the corre- 
sponding cumulative reward value is larger than another when the car reaches 
the destination, it means that the car takes fewer steps. A reward function can 
be a more complex formula than a constant when the reward strategy is related 
to states and actions. 


Training. The essence of DRL training is to Input layer Hidden layer Output layer 
update parameters in neural networks so that ° 


z 
b 
the networks can compute optimal actions By, w? 
for input states. A deep neural network is a p—>® eo 
directed graph comprised of an input layer, a 
ey 
vV —>| 


multiple hidden layers, and an output layer, 
as shown in Fig. 2. Each layer contains several 
nodes called neurons. They are connected to a 
the neurons on the following layer. Each edge 
has a weight. The values passed on the edge 
are multiplied by the weight. A neuron on 
hidden layers takes the sum of all the incoming values, adds a bias, and feeds 
the result to its activation function o. The output of ø is passed to the neurons on 
the following layer. There are several commonly used activation functions, e.g., 
ReLU (o(x) = maz(x,0)), Sigmoid (o(x) = i=) and Tanh (a(x) = <<), 
etc. In DRL, the inputs to a neural network are system states. The outputs are 
(probably continuous) actions that shall be performed to the present state. 
During training, agents continuously interact with the environment to obtain 
trajectories. A trajectory is a 4-tuple, consisting of a state s, the action a on s, 
the reward of executing a on s, and the successor state after the execution. A 
predefined loss function uses the collected trajectories to estimate an action value 
and compute the distance between the estimated value and the one computed by 
the neural network for the same state. Guided by the distance, the parameters 
in the network are updated using gradient descent algorithms [12]. The process 
is repeated until the system reaches a predefined maximal iteration limit or a 
preset cumulative reward threshold. 


O 
O 


Fig. 2. A simple neural network. 


x 


Algorithm 1: Training for the Mountain Car Task using DQN 


1 for episode = 1,...,M do 

2 Initialize so = (po, vo) 

3 for t =0,...,7 do 

4 at — N (pi, vt); /* To determine a; based on sz: = (pi, v+) and N */ 

5 (st41, —1.0) — system(st,ar); /* To execute at and transition to the next 
state St41 */ 

6 P — LIN, (si, ai, —1.0, Si+1),-.-, (Sj, aj, —1.0, sj+1)); /* To compute the 

distance */ 

7 N <— update(N,P); /* To update parameters in N based on P */ 
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There are several well-established training algorithms, such as Deep Q- 
Network (DQN) [35] and Deep Deterministic Policy Gradient (DDPG) [82]. 
Algorithm 1 depicts a high-level process of training the mountain car using 
DQN. We call the process of training the car to move from the initial position to 
the destination an episode. For each episode, the initial state is firstly determined 
(Line 2). Then, the controller determines the action to be adopted based on the 
current state s+ and the neural network N (Line 4). After performing the action, 
the controller receives a reward value (—1.0 in this case) and transitions to the 
next state based on the system dynamics (Line 5). A loss is estimated by calling 
the loss function £ with partially sampled trajectories. The loss is represented 
by (Line 6) used to update the parameters of the network N (Line 7). We 
omit the details of £, as it is not the emphasis of our paper. 


The Target DRL Systems in this Work. The types of DRL systems are 
diverse from different perspectives, such as the availability of system dynamics 
[17] and the determinism of actions. In this work, we assume system dynamics is 
prior knowledge for training, and the actions are deterministic. That is, a unique 
action is determined to take on the present state, and its successor state is also 
uniquely determined by system dynamics. 


3 Model Checking of DRL Systems 


3.1 The Model Checking Problem 


A trained deterministic DRL system can be represented as a tuple M = 
(S, A, f,7,S°, L), where S is the state space which is usually infinite, S° C S 
is the initial state space, A is a set of actions, f : S x A — S is the system 
dynamics, 7: S — A is a policy function, and L : S — 24? is a state labeling 
function. In this work, we use 7 to denote the policy that is implemented by the 
trained deep neural network in the system. 

The model M of a DRL system is essentially a Kripke structure [10], which is 
a 4-tuple (S, R, S°, L). Given two arbitrary states s,s’ in S, there is a transition 
from s to s’, denoted by (s,s’) € R, if and only if there is an action a in A such 
that a = m(s) and s’ = f(s,a). Given that a property is formalized by a formula 
@ in some logic, the model checking problem of the system is to decide whether 
M satisfies 6, denoted by M H @. 

In this work, we formulate properties in ACTL [4], a segment of CTL where 
only universal path quantifiers are allowed and negation is restricted to atomic 
propositions [14,15]. ACTL consists of state formula ® and path formula ọ in 
the following syntax: 


@ ::= true | false | a | ~a | B1 AB2 | OV Gq | Ay, 
p := XP | Pı U $a | Pı RP. 


The temporal operators fall into two main categories, i.e., quantifiers over 
paths and path-specific quantifiers. In ACTL, only the universal path quantifier 
A is considered. Path-specific quantifiers refer to X, U and R. 
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— A ọ: Path formula ọ has to hold on all paths starting from the current state. 

— X @: State formula & has to hold at the next state. 

— @, U @g: State formula ©; has to hold at least until state formula ə. 

— ®ı R $2: Formula Sə has to hold until and including a point where ®, first 
becomes true. If ; never becomes true, 2 must hold forever. 


Using the above basic temporal operators, we can define another two important 
path-specific quantifiers G (globally) and F (finally) with G @ = false R ® 
and F ® = true U ®. Intuitively, G @ means that ® has to hold on the entire 
subsequent path, and F ® means that ® eventually has to hold (somewhere on 
the subsequent path). 

We choose ACTL to formulate system properties or requirements in our 
framework for two main reasons. Firstly, in our framework, we rely on refinement 
to the abstract states where system properties are violated. Such states can 
be obtained as counterexamples returned by model checkers when the system 
properties defined in ACTL are verified not valid by model checking. Secondly, 
the verification results of ACTL formulas can be preserved by property-based 
abstraction [9,11]. Such preservation is vital to ensure the correctness of our 
verification results because the abstraction is necessary for our framework to 
guarantee the scalability of the verification algorithm. 


3.2 Challenges in Model Checking DRL Systems 


Unlike the model checking problems for finite-state systems, model checking 
M = @ for DRL systems is particularly challenging. The challenge arises from 
the three features of DRL systems, i.e., (i) the infinity and continuity of state 
space S, (ii) the non-linearity of system dynamics f, and (iii) the inexplicability 
of the policy 7 that is encoded as deep neural networks. Usually, the state space 
of DRL systems is continuous and infinite, and behaviors are non-linear due to 
high-order system dynamics. Even worse, the actions of states are determined 
by inexplicable deep neural networks, which means that the transitions between 
states cannot be defined as straightforwardly as those of traditional software 
systems. 

To build a model M for a DRL system, we have to compute the successor 
of each state s by applying the neural network 7 on s to compute the action a 
and then performing a to s according to the system’s dynamics f. Specifically, 
the successor of s can be represented as f(s,7(s)). The non-linearity of both 
f and a and the infinity of S makes the verification problem difficult. Most 
existing approaches rely on the over-approximation of f and m to simplify the 
problem [16,25,29, 46]. However, over-approximation inevitably introduces over- 
estimation and restricts to only safety properties and reachability analysis in 
bounded steps. 


200 P. Jin et al. 


4 The CEGAR-Driven DRL Approach 


4.1 The Framework 


Figure 3 shows the overview of our framework. It consists of three parts, i.e., 
training, verification and refinement. In the training part, a DRL system is 
trained on a finite set of abstract states. An actual state is first mapped to its 
corresponding abstract state, then fed into the neural network to compute a cor- 
responding action. The action is applied to the actual state to drive the system to 
transition to the next state. The reward is accumulated according to a predefined 
reward function, and the neural network is updated in the same way as conven- 
tional DRL algorithms. In the verification part, we build a Kripke structure on 
the finite abstract state space based on the trained neural network. Then, we 
verify the desired properties that are predefined in ACTL formulas @. If all the 
properties are verified valid, we stop training, and a DRL system is developed. 
If some property is verified not valid, we move to the refinement part. When 
verification fails, counterexamples are returned. They are the abstract states 
where the property is violated. We refine these states by subdividing them into 
fine-grained sub-states and substitute those bad states. We resume to train the 
system on the refined abstract state space and repeat the whole process. 


Action: —0.2 Training 
Current state 
(—0.9, —0.4) 
State (p, v) Abstract 
p € [-1.0, 1.0] | Replace o 
v € [-0.5, 0.5] Abstract state space Deep Neural Network 
[-1.0, —0.9] [-0.9, —0.8] , 
[-0.4, —0.3] [-0.4, —0.3] Model checking 4—ACTL formula ¢ 
[ae 02) ii [-0.9, —0.8] | 
[-0.5, —0.4] [-0.5, —0.4] 
Refine 
Counterexamples X Failed V Verified 
Refinement Done Verification 


Fig. 3. The training, verification and refinement framework for developing DRL sys- 
tems. 
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The integration of training, verification and refinement seamlessly constitutes 
a verification-in-the-loop DRL approach, driven by the counterexample-guided 
abstraction and refinement. We start with a coarse abstraction. After every 
training episode, we model check the system against all the predefined properties. 
If all the properties are verified, we stop training and obtain a verified system. 
Otherwise, counterexamples are returned. The abstract state space is refined for 
further training. After several iterations, a DRL system is trained with all the 
predefined properties rigorously verified. 


4.2 Training on Abstract States 


DRL is a process of learning optimal actions on all system states for specific 
objectives. A trained model partitions the state space into a family of sets such 
that the same action is taken in the states from a set [38]. Continuous state 
spaces can be adaptively discretized into finite ones for learning without affect- 
ing learning performance [41,42]. Motivated by this observation, we discretize 
a continuous state space into a finite set of fragments. We call each fragment 
an abstract state and train the DRL system by feeding abstract states into the 
deep neural network for decision making. 


V4 p2, ps] 


Vo. V4 


[po, p2] 
Vo, V4 


v3 


[po. p2] 
V2, V: 


[po; p2] 


Vo V; 


[p2, pa] 
vo, V: 


v2 


vi 


Pa, ps] | pa, ps] 


[po, pı] | [P1; p2] [po pı] | [P1 p2] [p2; p3] | [p3 pa] | | [P2 p3] | [p3. pal] |L [ 
[vovi] | Ivo. vi] [v2 va] | [v2. v3] (vo, vit | Ivo. vi] [v2, v3] | [v2. v3] [vovi] | v2. v3] 
[po Pil | [P1; p2] (po, Pil | [Pi p2] [p3, pa] [p2,P3] | [p3. Pal} | (pa, ps] | [P4 ps] 
Vi, V2 vja V: V3, V: V3, V. Vj, V: Vj V: V3, V V3, V4 V1, V; V3, V 


(a) An abstract state space. (b) An R-tree of the abstract state space. 


vo 
Po Pi P2 P3 Pas Ps 


Fig. 4. An example of encoding an abstract state space into an R-tree. 


System State Abstraction. Given an n-dimension DRL system, a concrete 
system state s is represented as a vector of n real numbers. Each number has a 
physical meaning about the system, such as speed and position in the running 
example. Let L; and U; be the lower and upper bounds for the i-th dimension 
value of S. Then, the state space S of the control system is M? [L;, Ui]. 

Initially, we use interval boxes to discretize S. An interval box I is a vector 
of n intervals, denoted by (11, J2,...,In). Each interval [;(1 < i < n) represents 
all the system states, denoted by S7,, where a state s belongs to S7, if and only 
if the i-th value in s is in J;. An interval box J represents the intersection of all 
the sets Sz, (i = 1,...,n). 
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Let d; E€ R (0 < di < U; — Li) be the diameter by which we subdivide evenly 
the interval [L;,U;] in each dimension i into (U; — L;)/d; unit intervals, and 
I; = [L;,U;]/d; denote the set of all the unit intervals. Then, we obtain the 
abstract state space S = Jı x... X In, which is an abstraction of the infinite 
continuous state space S. We call the vector (d1, d2,...,dn) of the n diameters 
abstraction granularity and denote it by 6. 

Given a continuous state space S$ and its corresponding abstract state space 
S, we call the mapping function from the states in S to the corresponding 
abstract states in S a transformer A: S — S. The transformer can be encoded 
as an R-tree, a tree-like data structure devised for efficiently indexing multi- 
dimensional objects [22]. Figure4 depicts an example of building an R-tree to 
index an abstract state space of the continuous space [vo, v4] x [po, ps]. A rect- 
angle on a leaf node represents an abstract state, and the one on a non-leaf 
node represents the minimum bounding rectangle enclosing all the rectangles 
on its child nodes. There can be multiple rectangles on a single node. R-tree 
supports intersection search, i.e., searching for the abstract states that intersect 
with the interval we are querying. Given a concrete state, an R-tree can quickly 
return its corresponding abstract state. Note that in Fig.4, we assume state 
space is discretized evenly for clarity. During training, the size of abstract states 
becomes diverse after iterative refinement, and the R-tree should be updated 
correspondingly. 


The Training Algorithms. The algorithms for training on abstract states can 
be achieved by extending existing DRL algorithms such as DQN and DDPG. 
The extension can be easily achieved by adapting the neural networks and loss 
functions in DRL systems so that they can admit abstract states as inputs. 


Algorithm 2: Abstraction-Based DRL Training 


1 for episode = 1,...,M do 

2 A — discretize(S, ô); /* To discretize S by abstraction granularity 6 */ 

3 Initialize so; 

4 for t = 0,...,T do 

5 st — Alst) ; /* To get abstract state of st */ 

6 at — N’ (s+); /* To determine action a, based on s+ and N' */ 

7 (st41, rt) — system(st, at) ; /* To execute at on s+ and transition to 
St41 with reward ry */ 

8 P = Loss(N’, (Si, Gi, Ti, Si+1)» -- (Sj, aj, Tj Sj+1)); /* To get loss due to 
at Y 

9 N' — update(N’,P); /* To update parameters in N’ based on P */ 


TRAINIFY 203 


For neural networks, we only need to modify the input layer by doubling the 
number of neurons on the input layer, denoted by N’. Given an n-dimension 
system, we declare 2n neurons. Each pair of neurons read the lower and upper 
bounds of an interval in an abstract state, respectively. This dedicated structure 
guarantees that a trained network can produce the same action for all the states 
that correspond to the same abstract state. 

Figure5 shows an example of adapting the 
network in the Mountain Car for training it on 
abstract states. For traditional DRL algorithms, 
two input neurons are needed in the neural net- 
work to take p and v as inputs, respectively. To 
train on abstract states, four input neurons are 
needed to take the lower and upper bounds of 
the position and velocity intervals in abstract 
states. For instance, let the interval box (Jp, Iv) 
be the abstract state of (p,v). Then, the lower Fig-5. Adapting neural net- 
bounds Jp, [, and the upper bounds Ip, T, of p,v Works iby abstractstates: 
are input to the four neurons, respectively. Apparently, this adaption guarantees 
that the neural network always produces the same action on the states that are 
transformed into the same abstract state. 

We consider incorporating these two steps to extend Algorithm 1 as an illus- 
trative example. Algorithm 2 depicts the main workflow where the differences 
are highlighted. The main difference from the traditional training process lies 
in line 6. Given a concrete state s = (s1,..., Sn), A will return the abstract 
state s = ([l1, ui],---,[In, Un]) such that l; < si < u; with i = 1,...,n, which is 
also the result fed into neural network. Although the dimension of input states 
increases, the form of corresponding output actions does not change. Therefore, 
the loss function can naturally adapt to changes in input states. 


4.3 Model Checking Trained DRL Systems 


A DRL system can be naturally verified using abstract model checking [26]. 
The actual states of the system are first abstracted in the same way used in 
training, and then the transitions between abstract states are determined by the 
corresponding action and dynamics. ACTL formulas are then model checked on 
the abstract state transition system. 


Building Kripke Structure. During the training phase, the actual state space 
has already been abstracted into a finite set S of abstract states. Therefore, the 
main task for abstract model checking is to build a Kripke structure by defining 
the transition relation on S. 
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Algorithm 3 depicts the pro- SS 
cess of building a Kripke struc- Algorithm 3: Building Kripke Struc- 
ture K for a trained DRL sys- _ture 


tem. Firstly, K is initialized on Input: Initial state s°, state space S, 

set S with R being empty. Start- system dynamics f, neural 

ing from an initial abstract state network N’ 

s°, we compute its successors and Output: A Kripke Structure K 

define the transitions from s° to 1 K = Initialize_Kripke_Structure() 

them. We repeat the process until 2 Queue + {s°} 

all reachable states are traversed. 3 While Queue is not empty do 
Given an abstract state s, 4 | Fetch s from Queue 

we compute its abstract succes- 5 for i=1,...,n do 

sor states by applying the corre- 6 k [l;, ui] — g( f(s, N’ (s)), i) 

sponding action a and the dynam- 7 {s!,...,s™}:= 

ics to s. Because the system is A((l1, u1], ---, [Ins Un], S) 

trained on abstract states, all 8 for j =1,...,mdo 

the actual states in s have the 9 K .add_edge(s — sî) 

same action, i.e., a = N'(s). Let 10 if sî is not traversed then 

f*(s,a) = {f(s,a)|s € s} be the 44 | Push sf into Queue 

set of all the successors of the L 


actual states in s. Due to the non- 42 return K 

linearity of f and the infinity of s, 

we over-approximate the set f*(s,a) = {f(s,a)|s € s} as an interval box. As 
shown in Fig. 6, the dashed box is an over-approximation of f*(s, a). The over- 
approximation may overlap one or more abstract states, e.g., s',...,s4 in the 
example. All the overlapped abstract states are successors of s. In Algorithm 3, 
function g calculates the interval box and function h determines the overlapped 
abstract states. Note that the shapes of abstract states may be different because 
they are refined during training, which is to be detailed in Sect. 4.4. 

We use an interval to approximate 
the i-th dimension’s values in all the 
successor states. Then, all the succes- 
sor states are approximated as a vec- 
tor of n intervals. We can compute the 
upper and lower bounds for each i by 
solving the following two optimization 
problems, respectively: 


Fig. 6. Transitions between abstract states 


argmax 1; - f(s, N’(s)) 


ses 
argmin vi: f(s, N’(s)) 
ses 
where, v; is a one-hot vector with the i-th element being 1. Because all the 
states in s have the same action according to the network, N’(s) in the above 
optimization problems can be substituted for a constant, i.e., the action taken 
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by the system on all the states in s. The substitution significantly simplifies the 
optimization problems; no information of the networks is needed in the simplified 
problems. The simplified problems can be efficiently solved using off-the-shelf 
scientific computing tools such as SciPy [48]. 

We consider an example in the mountain car system. We assume that the 
current abstract state s is ([0, 0.2], [0,0.02]) and the adopted action is 0.001, 
which says that the controller accelerates to the right for all states in s. Based 
on the dynamics defined by Eq. 1, we can compute the upper bounds of both 
position and velocity in the next step by solving the following two optimization 
problems: 


arg max pe + Ve (pi41) 
p+ €[0,0.2],vz€[0,0.02] 
arg max vz + 0.001 — 0.0025cos(3p;) (vt41) 


p+ €[0,0.2],v+ € [0,0.02] 


The lower bounds of p+ı and v+: are calculated similarly. Then, we obtain an 
abstract state s’ = ([0, 0.22], [—0.0035,0.0165]), which is an overestimated set of 
all the actual successors of the states in s. There is a transition from s to any 
abstract state s” = (|p, D],[v,U]) in S, ifs’ and s” overlap, i.e., (0<p < 0.22V0< 
p< 0.22) A (—0.0035 < v < 0.0165 V —0.0035 < 7 < 0.0165) is true. Note that the 
transition from s to s’ includes all the transitions between the actual states in s 
and s’, respectively. It may also include those that do not actually exist due to 
the overestimation. 

There are other approaches for over-approximating the set f*(s,a), such as 
template polyhedrons like rectangle and octagon [2]. Note that there is always 
a trade-off between the tightness of the polyhedral and the efficiency of com- 
puting it. For instance, an octagon can approximate the set more tightly than 
a rectangle. However, it costs double effort to compute the borders. The tighter 
an over-approximation is, the more accurate the set of computed successors is, 
but the more time it costs to compute the approximation. 


Property-Based Abstraction. For those high-dimensional DRL systems, the 
abstract state space may be still too huge to model check directly when the 
abstraction granularity becomes small after refinement. To improve the model 
checking scalability, we further abstract the constructed Kripke structure based 
on the ACTL formula & to be model checked using the abstraction approach in 
the work [9]. 


Definition 1 (State Abstraction). Given an abstract state space S = I, x 
,XĪņ„ and an ACTL formula P, let De be the set of dimensions that occur in 
p aie S= IacDs Ja. Function ag : S — S is an abstract transformer such that 


for everys E S and$ € S, $= as(s ) if and only if sd] = S|d] for all d € Do. 


Given a Kripke structure K = (S,R,S°,L) and an ACTL formula 9, let 
: S — S be the abstract transformer, and AP C AP be all the atomic 


a in . We can construct the following abstract Kripke structure 
R= (S, R, S° I) based on ag, where: 


206 P. Jin et al. 


= S= Lge DeLa; 
~ R= {(aa(s), a0(s’))|s, 8’ € S.(s,s') € R}; 
- $° = {a(s) | s € S°}; 


- L: § — 24? such that L(8) = L(s) N AP where s € S and $ = ag(s). 


We call K a simulation of K with respect to ® An important property of the 
simulation is that the property represented by @ is preserved by the abstract 
model K. 


Theorem 1 (Soundness). Let K be a simulation of K with respect to an 
ACTL formula &, K = B implies K — 8. 


The proof of Theorem 1 is straightforward. We omit the proof due to space 
limit. According to the theorem, we can conclude that K — & holds whenever 


we find a simulation K of K and model check that K E ® holds. 


p: [-1.2, 0.2] 
v : [-0.07, 0.02] 


P 12,02] 


p: L02, 0.2] 


“bo2.04 
v : [-0.02, 0.02] .02,0.02] || v : [-0.02, 0.02] 
[+++ Picoa -| [e [Eoo Ee 


(a) Counterexamples on an R-tree. (b) The R-tree after refinement on the counterexample. 


Fig. 7. An example of refinements on abstract states where properties are violated. 


4.4 Counterexample-Guided Refinement 


If a formula @ is verified not true, our algorithm returns corresponding coun- 
terexamples. A counterexample is an abstract state where ® is violated. We 
refine the abstract state into finer ones and substitute them in the abstract state 
space for further training. 

A naive refinement approach subdivides each dimension of states into two 
intervals. Assuming that a property is violated on an abstract state s = 
([lo, uol, -- - , In, Un]), we can simply divide each dimension evenly into two inter- 
vals ((li, (l; + ui) /2], (i + us) /2, ui]), and obtain 2” finer abstract states. Appar- 
ently, the refinement may lead to state space explosion, particularly for high- 
dimensional systems. 

In our approach, we only refine the states on the dimensions that are used to 
define the properties being verified to avoid state explosion. Considering the moun- 
tain car example, we assume that the formula is AF'[p > 0.45], saying that the car 
will eventually reach the hilltop where p = 0.45. Suppose that the property fails 
and counterexamples are returned. We assume s = ([0, 0.2], [0,0.02]) is the state 
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where the property is violated, as shown in Fig.7 (a). We bisect the state into 
two fine-grained sub-states, s! = ({0,0.1], [0,0.02]) and s? = ((0.1, 0.2], [0, 0.02]). 
Then, we substitute the two fine-grained states for s on the R-tree for further train- 
ing. Figure 7 (b) shows the new R-tree after the substitution. 

It is worth mentioning that counterexamples may be false positives. Abstract 
states may include the actual states that are unreachable in the trained system 
because of the approximation of system dynamics. Unfortunately, it is difficult 
to check which states are actually unreachable because we need to know their 
corresponding initial state to check the reachability of these bad states. How- 
ever, the corresponding initial state is enclosed in an abstract state and cannot 
be identified due to the abstraction. In our approach, we perform refinement 
without checking whether the counterexamples are real or not. After refinement, 
the abstract states become finer-grained. Counterexamples can be discarded by 
training and verifying on these finer-grained abstract states. The price of such 
extra refinements is that more iterations of training and verification are con- 
ducted, but the benefit is that the performance of the trained systems is better. 


5 Implementation and Evaluation 


5.1 Implementation 


We implement our framework into a prototype toolkit called TRAINIFY in 
Python. In the toolkit, we leverage the open-source library pyModelChecking [6] 
as the back-end model checker and the scientific computing tool SciPy [48] as 
an optimization solver. 


5.2 Benchmarks and Experimental Settings 


We evaluate the effectiveness of our approach on a wide range of classic con- 
trol tasks from public benchmarks. For each control task, we train two DRL 
systems using our approach and the corresponding conventional DRL approach, 
respectively. We compare the two trained systems in terms of their reliability, 
verifiability and system performance. 


Benchmarks. We choose six classic control problems. Three of them are from 
the DRL training platform Gym [5], including Mountain Car, Pendulum and 
Cartpole. The other three, i.e., B1, B2 and Tora, are the problems that are 
widely used for evaluation by state-of-the-art tools [19,25,27, 28]. 


1. Mountain Car (MC). The running example in Sect. 2. 

2. Pendulum (PD). A pendulum that can rotate around an endpoint is delin- 
eated. Starting from a random position, the pendulum shall swing up and 
stay upright. 

3. CartPole (CP). A pole is attached by an un-actuated joint to a cart. The 
goal of training is to learn a controller that prevents the pole from falling over 
by applying a force of +1 or —1 to the cart. 
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4. B1 and B2. Two classic nonlinear systems, where agents in both systems aim 
to arrive at the destination region from the preset initial state space [19]. 

5. Tora. A cart is attached to a wall with a spring. It is free to move on a 
frictionless surface. Inside the cart, there is an arm free to rotate about an 
axis. The controller’s goal is to stabilize the system at the equilibrium state 
where all the system variables are equal to 0. 


Training Configurations and Evaluation Metrics. We adopt the same 
system configurations and training parameters for each task, including neural 
network architecture, system dynamics, time interval, DRL algorithms and the 
number of training episodes. 

We choose three metrics, including the satisfaction of predefined properties, 
cumulative reward and robustness, to evaluate and compare the reliability, veri- 
fiability and performance of the DRL systems trained in our approach and those 
trained in the conventional DRL approach for the same task. The first metric 
is about reliability and verifiability. The other two are about performance. The 
cumulative reward is an important figure to evaluate a trained system’s perfor- 
mance because maximizing the cumulative reward is the objective of learning. 
Robustness is another essential criterion for DRL systems because the systems 
are expected to be robust against perturbations from both the environment and 
adversarial attacks. Note that we classify robustness into performance category 
instead of reliability because we restrict the reliability of DRL systems to the 
safety and functional requirements. 


Experimental Settings. All experiments are conducted on a workstation run- 
ning Ubuntu 18.04 with a 32-core AMD Ryzen Threadripper CPU @ 3.7 GHz 
and 128GB RAM. 


5.3 Reliability and Verifiability Comparison 


We first evaluate the reliability and verifiability of the DRL systems trained in 
our approach and conventional approach, respectively. For each task, we prede- 
fined system properties according to their safety and functional requirements. 
The functional requirement is usually the objective of control tasks. For instance, 
the controller’s objective to train in the mountain car example is to drive the car 
to the hilltop. We define an atomic proposition p > 0.45 to indicate that the car 
reaches the hilltop. Then, we can define an ACTL formula ®ı = AF (p > 0.45) 
to represent the liveness property. Safety requirements in DRL systems usually 
specify important parameters of the systems that must always be kept in safe 
ranges. For instance, a safety requirement in the mountain car example is that 
the car’s velocity must be greater than 0.02 when the car moves to a position 
around 0.2 within a 0.05 deviation. The property can be represented by the 
ACTL formula ə as defined in Table 1. The properties of other tasks are for- 
malized similarly. The formulas and the types of properties are shown in the 
table. 
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Table 1. Expected properties and their definitions in ACTL of the selected control 
tasks. 


Task |ID ACTL formula Type Meaning 
¢1|AF(p > 0.45) Liveness |The car always reaches the target finally. 
The car’s speed should be greater than 0.02 at 
MC $2, AG(Ip — 0.2| < 0.05 > v > 0.02) Safety the position 0.2 within a 0.05 deviation. 
PD $3 AG(|6|< Z) Safety The Pero | ene @ must always be in the 
preset range [- 5, 3]. 
The cart always stays in the safe region and the 
P A <2.4 <|0.21 t 
4 oa Gen (Pl = AGS10 D Safety pole cannot fall down in n time steps. 
Bi $5|AF (xı E [0, 0.2] A z2 € [0.05, 0.3]) Liveness |The agent always reaches the target finally. 
6| AG(|xı|<1.5 A |z2|< 1.5) Safety The agent always stays in the safe region. 
7| AF (target) Liveness |The agent always reaches the target finally. 


A((|a1|<1.5 A |zg|<1.5) U target) 
v AG(|21|<1.5 A |xg| <1.5) 


The agent must stay in the safe region until it 


B2 $s reaches the target region. 


Safety 


The agent can stay in the preset state space 
Tora |9| AGi<n(lx1|<1.5 A |z3|<1.5) Safety s x P n 


with n time steps. 
Remarks. target is an atomic proposition i.e., xı € [—0.3,0.1] A z2 € [—0.35, 0.5] in 
B2. 


We compare the reliability and verifiability of all the trained DRL systems 
with respect to their predefined properties using both verification and simula- 
tion. The DRL systems trained in our approach can be naturally verified in our 
framework. For those trained in the conventional DRL approaches, our verifi- 
cation approach is not applicable because we cannot construct abstract Kripke 
structures for them. The main reason is that we cannot abstract the system 
states such that there is a unique action on all the actual states represented by 
the same abstract state. We therefore resort to the state-of-the-art reachability 
analysis tool Verisig 2.0 [25] to verify them. We also simulate all the trained 
systems in a fixed number of rounds and detect the occurrences of property vio- 
lations. The purposes of the simulation are twofold: (i) to partially reflect the 
reliability of systems; and (ii) to validate the verification results in a bounded 
number of steps. 

Table2 shows the comparison results. We can observe that all the sys- 
tems trained in our approach are successfully verified, and the correspond- 
ing properties hold on them. No violations are detected by simulation. For 
those systems trained in conventional DRL algorithms, only 8 out of 16 are 
successfully verified by Verisig. There are two cases, where Verisig returns 
Unknown when verifying $7 for task B2. It means that the verification fails 
because Verisig 2.0 cannot determine whether the destination region (defined by 
zı € [-0.3,0.1] A z2 € [—0.35,0.5]) must always be reached when it computes a 
larger region that overlaps the target. The extra part in the larger region may be 
an overestimation caused by the over-approximation. By simulation, we detect 
violations to 7. The violations can be considered as counterexamples to the 
property. The other properties such as ¢2, 3,4, and ¢g are not supported by 
Verisig 2.0. Among these unverified properties, we detect there exist violations by 
simulation for three of them. The violations indicate that the systems trained in 
conventional DRL approaches may not satisfy expected properties, and existing 
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Table 2. Comparison of the verification and simulation results between the DRL sys- 
tems trained in our approach and conventional DRL algorithms, respectively. 


Task Network Property By Trainify By conventional algorithms 
AF. Size T.T. V.R. V.T. Vio. | T.T. V.R. V.T. Vio. 
TA $1 306 Y 2.8 0 297 v 45.5 0 
Sigmoid 2 x 16 
ie $2 302 Y 59 0. 297 N/A - 0 
= Qı 453 v 29.1 o| 441 v 3709 0 
Sigmoid 2 x 200 
$2 462 Yo. o| 441 N/A =. 0 
PD |ReLU 3x 128 $3 771 Y 12 o| 501 N/A - 0 
CP |ReLU 3x64 pa 135 v 3266 0| 101 N/A - 12 
2 1 4. 
ter $5 5 vy 890 O| 3 v 6 0 
Bi $6 43 Y 53 O| 31 v 4.6 0 
2 41 28.2 
Tanh 2x100 $5 3 ee eee R 3 j 
$6 25 Y 38 Of 41 v 28.2 0 
$7 ir A 12 o 9 Unknown 4.8 27 
Tanh 2 x 20 
g $s 9 v.23 0 9 N/A 0 
7 9 v 1.3 0 11 Unknown 55.3 23 
Tanh 2 x 100 
og 6 Zatz 0 AT N/A - 0 
z Tanh 3x100 Q9 402 vV 1132 o| 217 v 1271 0 
ora 
Tanh 3x200 Q9 495 V 1242 0| 239 v 6829 0 
Remarks. A.F.: activation function; T.T.: average training time per iter- 
ation; V.R.: verification result; V.T.: average verification time per itera- 
tion; Vio.: the number of violations in simulation; N/A: not applicable; 
Unknown: verification fails. Time is recorded in seconds. 


state-of-the-art verification tools cannot always verify them or find violations. 
Our approach can guarantee that the trained systems satisfy the properties. The 
simulation results show there are indeed no violations. 

As for efficiency, on average, our approach costs slightly more time on the 
training because it takes extra time to look up the corresponding abstract state 
for an actual state at every training step. But the small-time overhead is worth- 
while for the sake of being verifiable. Besides verifiability, another benefit from 
this extra time cost is that the efficiency of verification in our approach is not 
affected by the size and type of neural networks because we treat them as black- 
box in the verification. On the contrary, the efficiency of verifying the systems 
that are trained in conventional approaches is restricted by neural networks, as 
the verification time cost by Verisig 2.0 shows. 

Based on the above analysis, we conclude that the reliability of the DRL 
systems developed in our approach are more trustworthy as their predefined 
properties are provably satisfied by the systems. Besides, their verification is 
more amenable and scalable than the systems trained in conventional DRL 
approaches. 


5.4 Performance Comparison 


We compare the performance of the DRL systems trained in our approach 
and the conventional approaches in terms of cumulative reward and robustness, 
respectively. 
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Fig. 8. Robustness comparison of the systems trained in our approach (blue) and in 
conventional approaches (orange). The number in the parentheses is the base of øo. For 
example, in Mountain Car, when the abscissa is equal to 50, ø = 50 x 0.0005 = 0.025. 


(Color figure online) 


Cumulative Reward. We 
record the cumulative reward 


Table 3. Comparison of accumulated reward. 


b i h i f Case| Alg. Network TRAINIFY Base 
y running each system for - ; 

: A : S d 2x16 —112 —116 
100 episodes in the simula- © MC |DQN Siem x 200 -110 —111 
tion environment and calcu- PD |DDPG|ReLU 3 x 128 —131 —133 
lating the averages. A larger CP [DQN |ReLU 3x 64 500 500 
reward implies that a sys- py |pppq| tanh = 2 « 20 —120 —120 
tem has a better performance. Tari z x aa a se 
Table 3 shows the cumulative B2 |DDPG Tork 2 x 100 97 3 4 
reward of the six DRL sys- Tanh 3x 100 50 50 
tems trained in our approach Tora/DDPG Tanh 3x200 50 50 


and conventional approaches, 
respectively. All the trained 
systems can achieve almost optimal cumulative reward. Among the ten cases, 
the systems trained in our approach have better performances in four cases, 
equivalent in four cases, and lower in the rest two cases. Note that there is a 
difference, which is due to floating point errors, but it is almost negligible. In this 
sense, we say that the performance of the systems trained in the two different 
approaches is comparable. 

Another observation from the results is that a system with a bigger neural 
network produces a larger reward. This characteristic is shared by both our 
approach and the conventional approaches. Thus, we can increase the size of 
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networks and even modify network architectures for better performance in our 
approach. Such change will not cause the extra cost to the verification of the 
systems because our approach is entirely black-box, using the network only to 
output actions for the given abstract state. 


Robustness. We demonstrate that the systems trained in our approach can 
be more robust than those trained in conventional DRL algorithms when the 
perturbation is set in a reasonable range. To examine the robustness, we add 
Gaussian noise to the actual states of systems and check the cumulative reward 
of the systems under different levels of perturbations. Given an actual state 
s = (S1,...,5n), we add a noise Xj,...X, to s and obtain a perturbed state 
s' = (s1 + Xj,...,5n+ Xn), where X; ~ N(u, 0°) for 1 < i < n with u = 0. We 
start with o = 0 and increase it gradually. 

Figure8 shows the trend of cumulative reward of the systems with the 
increase of perturbations. For each system, we evaluate 200 different levels of 
perturbations, and for each level of perturbation, we conduct 20 repetitions to 
obtain the average and standard deviation of the reward, represented by the solid 
lines and shadows in Fig. 8. The general trend is that the cumulative reward dete- 
riorate for all the systems that are trained in either of the approaches. The result 
is reasonable because the actions computed by neural networks are optimal to 
non-perturbed states but may not be optimal to the perturbed ones, leading to 
lower reward at some steps. However, we can observe that the decline ratio of 
the systems trained in our approach (blue) is smaller than the one trained in 
conventional approaches (orange). When o = 0, the accumulated reward of the 
two systems for the same task is almost the same. With the increase of ø, the 
performance declines more slowly for the systems trained in our approach than 
for those trained in the conventional approaches when ø is in a reasonably small 
range. That is because a perturbed state may belong to the same abstract state 
as its original state, and thus has the optimal action. In this sense, we say the 
perturbation is absorbed by the abstract state and the neural networks become 
less sensitive to perturbations. Our additional experiments on these examples 
show that a larger abstraction granularity produces a more robust system. 


6 Related Work 


Our work has been inspired by several related works, which attempted to inte- 
grate formal methods and DRL approaches. We classify them into the following 
three categories. 


Verification-in-the-Loop Training. Verification-in-the-loop training has 
been proposed for developing reliable Al-powered systems. A pioneering work 
is that Nilsson et al. proposed a correct-by-construction approach for develop- 
ing Adaptive Cruise Control (ACC) by first formally defining safety properties 
in Linear Temporal Logic (LTL) and then computing the safe domain where 
the LTL specification can be enforced [36]. Wang et al. proposed a correct- 
by-construction control learning framework by leveraging verification during 
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training to formally guarantee that the learned controller satisfies the required 
reach-avoid property [49]. Lin et al. proposed an approach for training robust 
neural networks for general classification problems by fine-tuning the parameters 
in the networks based on the verification result [33]. Our work is a sequel of these 
previous works with new features of training on abstract states, counterexample- 
guided abstraction and refinement, and supporting more complex properties. 


Safe DRL via Formal Methods. Most of the existing approaches for for- 
mal verification of DRL systems follow the train-then-verify style. Bacci and 
Parker [3] proposed an approach to split an abstract domain into fine-grained 
ones and compute their successor abstract states separately for probabilistic 
model checking of DRL systems. The approach can reduce the overestimation 
and meanwhile construct a transition system upon abstract states, which allows 
us to verify more complex liveness and probabilistic properties than safety using 
bounded model checking [29] and probabilistic model checking. A criteria of 
subdividing an abstract domain is to ensure that all the states in the same sub- 
domain have the same action. Identifying these sub-domains is computationally 
expensive because it relies on iterative branching and bounding [3]. Further- 
more, these approaches need to compute the output range of the neural net- 
works on the abstract domains, and therefore are restricted to specific types and 
scales of networks. Besides model checking, reachability analysis [13,16,25, 46] 
has been well studied to ensure the safety of DRL systems. The basic idea is 
to over-approximate system dynamics and neural networks to compute over- 
estimated safe regions and check whether they have interactions with unsafe 
regions. However, large overestimation, limited scalability, and requirements on 
specific network architectures are the common restrictions of these approaches. 
Online verification [47] and runtime monitoring [18] in formal methods is another 
lightweight but effective means to detect potential flaws timely during system 
execution. Another direction is to synthesize safe shields [7,54] and barrier func- 
tions [53] to prevent agents from adopting dangerous actions. A strong assump- 
tion of these methods is that the valid safe states set is given in advance. How- 
ever, computing valid safe states set may be computationally intensive, and it is 
restricted to safety properties. 


Abstraction and State Discretization in DRL. Abstraction in DRL has 
gained more attention in recent years. Abel presented a theory of abstraction for 
DRL in his dissertation and concluded that learning on abstraction can be more 
efficient while preserving near-optimal behaviors [1]. Abel’s abstraction theory 
is focused on the systems with finite state space for learning efficiency. Our work 
demonstrates another advantage of learning on abstraction, i.e., formal reliability 
guarantee to trained systems even with infinite state space. 

The state-space abstraction approach in our framework is also inspired by 
state space discretization, a technique for discretizing continuous state space, by 
which a finer partition of the state-action space is maintained during training for 
higher payoff estimates [41,42]. Our work shows that, after being integrated with 
formal verification, state-space discretization is also useful in developing highly 
reliable DRL systems without loss of performance. In addition, our CEGAR- 
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driven approach provides a flexible mechanism for fine-tuning the granularity of 
discretization to reach an appropriate balance between system performance and 
the scale of state space for formal verification. 


7 Discussion and Conclusion 


We have presented a novel verification-in-the-loop framework for training and 
verifying DRL systems, driven by counterexample-guided abstract and refine- 
ment. The framework can be used to train reliable DRL systems with their 
desired properties on safeties and functionalities formally verified, without com- 
promising system performances. We have implemented a prototype TRAINIFY 
and evaluated it by training six classic control problems from public benchmarks. 
The experimental results showed that the systems trained in our approach were 
more reliable and verifiable than those trained in conventional DRL approaches, 
while their performances are comparable or even better than the latter. 

Our verification-in-the-loop training approach sheds light on a new search 
direction for developing reliable and verifiable Al-empowered systems. It fol- 
lows the idea of correctness-by-construction in traditional trustworthy software 
system development and makes it possible to take system properties (or require- 
ments) into account during the training process. It also reveals that (i) it is not 
necessary to learn on actual data to build high-performance (e.g., high reward 
and robust) DRL systems, and (ii) abstraction is an effective means to deal with 
the challenges in verifying DRL systems and shall be introduced earlier during 
training, rather than an ex post facto method in verification. 

Our work would inspire more research in this direction. One important 
research objective is to investigate appropriate abstractions for the DRL sys- 
tems with high dimensions. In our current framework, we adopt the simplest 
interval abstraction that suffices to the systems with low dimensions. It would 
be interesting to investigate more sophisticated abstractions such as floating- 
point polyhedra combined with intervals, designed mainly for neural networks 
[43], to those high-dimensional DRL systems. Another direction is to extend our 
framework to non-deterministic DRL systems. In the non-deterministic case, a 
neural network returns both actions and their corresponding probabilities. We 
can associate probabilities to state transitions and obtain a probabilistic model. 
The model can be naturally verified using existing probabilistic model checkers 
such as Prism [30]. Thus, we believe that our approach is also applicable to those 
systems after a slight extension. It would be another piece of our future work. 
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Abstract. Neural networks are very successful at detecting patterns in noisy 
data, and have become the technology of choice in many fields. However, their 
usefulness is hampered by their susceptibility to adversarial attacks. Recently, 
many methods for measuring and improving a network’s robustness to adversar- 
ial perturbations have been proposed, and this growing body of research has given 
rise to numerous explicit or implicit notions of robustness. Connections between 
these notions are often subtle, and a systematic comparison between them is miss- 
ing in the literature. In this paper we begin addressing this gap, by setting up gen- 
eral principles for the empirical analysis and evaluation of a network’s robustness 
as a mathematical property—during the network’s training phase, its verification, 
and after its deployment. We then apply these principles and conduct a case study 
that showcases the practical benefits of our general approach. 


Keywords: Neural Networks - Adversarial Training - Robustness - Verification 


1 Introduction 


Safety and security are critical for many complex systems that use deep neural networks 
(DNNs). Unfortunately, due to the opacity of DNNs, these properties are difficult to 
ensure. Perhaps the most famous instance of this problem is guaranteeing the robustness 
of DNN-based systems against adversarial attacks [5,17]. Intuitively, a neural network 
is €-ball robust around a particular input if, when you move no more than € away from 
that input in the input space, the output does not change much; or, alternatively, the 
classification decision that the network gives does not change. Even highly accurate 
DNNs will often display only low robustness, and so measuring and improving the 
adversarial robustness of DNNs has received significant attention by both the machine 
learning and verification communities [7, 8, 15]. 

As a result, neural network verification often follows a continuous verification 
cycle [9], which involves retraining neural networks with a given verification prop- 
erty in mind, as Fig. | shows. More generally, such training can be regarded as a way to 
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impose a formal specification on a DNN; and so, apart from improving its robustness, 
it may also contribute to the network’s explainability, and facilitate its verification. Due 
to the high level of interest in adversarial robustness, numerous approaches have been 
proposed for performing such retraining in recent years, each with its own specific 
details. However it is quite unclear what are the benefits that each approach offers, from 
a verification point of view. 

The primary goal of this case- 


_ verify _ 


study paper is to introduce a more Bee x 

om : 4 7 : `a 
holistic methodology, which puts the Neural Network| Verification | Verifier 
verification property in the centre of x Property > 
the development cycle, and in turn rer a 


permits a principled analysis of how 
this property influences both training 
and verification practices. In particu- 
lar, we analyse the verification properties that implicitly or explicitly arise from the 
most prominent families of training techniques: data augmentation [14], adversarial 
training [5,10], Lipschitz robustness training [1,12], and training with logical con- 
straints [4,20]. We study the effect of each of these properties on verifying the DNN in 
question. 

In Sect. 2, we start with the forward direction of the continuous verification cycle, 
and show how the above training methods give rise to logical properties of classifica- 
tion robustness (CR), strong classification robustness (SCR), standard robustness (SR) 
and Lipschitz robustness (LR). In Sect. 4, we trace the opposite direction of the cycle, 
i.e. show how and when the verifier failure in proving these properties can be miti- 
gated. However Sect. 3 first gives an auxiliary logical link for making this step. Given 
a robustness property as a logical formula, we can use it not just in verification, but 
also in attack or property accuracy measurements. We take property-driven attacks as 
a valuable tool in our study, both in training and in evaluation. Section4 makes the 
underlying assumption that verification requires retraining: it shows that the verifier’s 
success ranges only 0-1.5% for an accurate baseline network. We show how our logical 
understanding of robustness properties empowers us in property-driven training and in 
verification. We first give abstract arguments why certain properties are stronger than 
others or incomparable; and then we use training, attacks and the verifier Marabou to 
confirm them empirically. Sections 5 and 6 add other general considerations for setting 
up the continuous verification loop and conclude the paper. 


Fig. 1. Continuous Verification Cycle 


2 Existing Training Techniques and Definitions of Robustness 


Data Augmentation is a straightforward method for improving robustness via train- 
ing [14]. It is applicable to any transformation of the input (e.g. addition of noise, trans- 
lation, rotation, scaling) that leaves the output label unchanged. To make the network 
robust against such a transformation, one augments the dataset with instances sampled 
via the transformation. 

More formally, given a neural network N : R” — R”, the goal of data augmenta- 
tion is to ensure classification robustness, which is defined as follows. Given a training 
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dataset input-output pair (x,y) and a distance metric | - — - |, for all inputs x within 
the e-ball distance of x, we say that N is classification-robust if class y has the largest 
score in output N(x). 


Definition 1 (Classification robustness). 
CR(e,x) = Vx: |x — &| < € > arg max N(x) = y 


In order to apply data augmentation, an engineer needs to specify: c1. the value of e€, 
i.e. the admissible range of perturbations; ¢2. the distance metric, which is determined 
according to the admissible geometric perturbations; and c3. the sampling method used 
to produce the perturbed inputs (e.g., random sampling, adversarial attacks, generative 
algorithm, prior knowledge of images). 

Classification robustness is straightforward, but does not account for the possibil- 
ity of having “uncertain” images in the dataset, for which a small perturbation ideally 
should change the class. For datasets that contain a significant number of such images, 
attempting this kind of training could lead to a significant reduction in accuracy. 

Adversarial training is a current state-of the-art method to robustify a neural net- 
work. Whereas standard training tries to minimise loss between the predicted value, 
f(x), and the true value, y, for each entry ($, y) in the training dataset, adversarial 
training minimises the loss with respect to the worst-case perturbation of each sam- 
ple in the training dataset. It therefore replaces the standard training objective L(x, y) 
with: maxyx:|x—%|<e L£(x,y). Algorithmic solutions to the maximisation problem that 
find the worst-case perturbation has been the subject of several papers. The earliest 
suggestion was the Fast Gradient Sign Method (FGSM) algorithm introduced by [5]: 


FGSM(x) = x + € - sign(VxL£(x, y)) 


However, modern adversarial training methods usual rely on some variant of the Pro- 
jected Gradient Descent (PGD) algorithm [11] which iterates FGSM: 


PGDo(&) =; PGD,+1(%) = PGD,(FGSM(x)) 


It has been empirically observed that neural networks trained using this family 
of methods exhibit greater robustness at the expense of an increased generalisation 
error [10,18,21], which is frequently referred to as the accuracy-robustness trade-off 
for neural networks (although this effect has been observed to disappear as the size of 
the training dataset grows [13]). 

In logical terms what is this procedure trying to train for? Let us assume that there’s 
some maximum distance, ô, that it is acceptable for the output to be perturbed given 
the size of perturbations in the input. This leads us to the following definition, where 
|| - —- || is a suitable distance function over the output space: 


Definition 2 (Standard robustness). 


SR(c,6,x) Ê Vx: |x —&| < € > || f(x) — f(RI| <6 
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We note that, just as with data augmentation, choices cl—c3 are still there to be 
made, although the sampling methods are usually given by special-purpose FGSM/PGD 
heuristics based on computing the loss function gradients. 


Training for Lipschitz Robustness. More recently, a third competing definition of 
robustness has been proposed: Lipschitz robustness [2]. Inspired by the well-established 
concept of Lipschitz continuity, Lipschitz robustness asserts that the distance between 
the original output and the perturbed output is at most a constant L times the change in 
the distance between the inputs. 


Definition 3 (Lipschitz robustness). 
LR(e, L, $) = Yx : |x —&| < € > || f(x) — f(®)|| < Lx - | 


As will be discussed in Sect.4, this is a stronger requirement than standard robust- 
ness. Techniques for training for Lipschitz robustness include formulating it as a semi- 
definite programming optimisation problem [12] or including a projection step that 
restricts the weight matrices to those with suitable Lipschitz constants [6]. 


Training with Logical Constraints. Logically, this discussion leads one to ask whether 
a more general approach to constraint formulation may exist, and several attempts in 
the literature addressed this research question [4,20], by proposing methods that can 
translate a first-order logical formula C into a constraint loss function Lo. The loss 
function penalises the network when outputs do not satisfy a given Boolean constraint, 
and universal quantification is handled by a choice of sampling method. Our standard 
loss function £ is substituted with: 


L*(&,y) = aL(&,y) + PLo(,y) 1) 


where weights a and 8 control the balance between the standard and constraint loss. 

This method looks deceivingly as a generalisation of previous approaches. However, 
even given suitable choices for c1—c3, classification robustness cannot be modelled via 
a constraint loss in the DL2 [4] framework, as argmaz is not differentiable. Instead, 
[4] defines an alternative constraint, which we call strong classification robustness: 


Definition 4 (Strong classification robustness). 
SCR(e,n, x) =Vx: |x- å| < e> f(x) >7 


which looks only at the prediction of the true class and checks whether it is greater than 
some value 7) (chosen to be 0.52 in their work). 

We note that sometimes, the constraints (and therefore the derived loss functions) 
refer to the true label y rather than the current output of the network f(x), e.g. Vx : 
|x — x| < e > |f(x) — y| < ô. This leads to scenarios where a network that is robust 
around xX but gives the wrong prediction, being penalised by Lco which on paper is 
designed to maximise robustness. Essentially £c is trying to maximise both accuracy 
and constraint adherence concurrently. Instead, we argue that to preserve the intended 
semantics of a and ĝ it is important to instead compare against the current output of the 
network. Of course, this does not work for SCR because deriving the most popular class 
from the output f(x) requires the arg max operator—the very function that SCR seeks 
to avoid using. This is another argument why (S)CR should be avoided if possible. 
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3 Robustness in Evaluation, Attack and Verification 


Given a particular definition of robustness, a natural question is how to quantify how 
close a given network is to satisfying it. We argue that there are three different measures 
that one should be interested in: 1. Does the constraint hold? This is a binary measure, 
and the answer is either true or false. 2. If the constraint does not hold, how easy is it 
for an attacker to find a violation? 3. If the constraint does not hold, how often does the 
average user encounter a violation? Based on these measures, we define three concrete 
metrics: constraint satisfaction, constraint security, and constraint accuracy.' 

Let ¥ be the training dataset, B(X,«) = {x € R” | |x — £| < e} be the e-ball 
around x and P be the right-hand side of the implication in each of the definitions 
of robustness. Let Iy be the standard indicator function which is 1 if constraint ¢(x) 
holds and 0 otherwise. The constraint satisfaction metric measures the proportion of 
the (finite) training dataset for which the constraint holds. 


Definition 5 (Constraint satisfaction). 


CSat(¥ S > eBGP) 
= REX 


In contrast, constraint security measures the proportion of inputs in the dataset such that 
an attack A is unable to find an adversarial example for constraint P. In our experiments 
we use the PGD attack for A, although in general any strong attack can be used. 


Definition 6 (Constraint security). 


Finally, constraint accuracy estimates the probability of a random user coming 
across a counter-example to the constraint, usually referred as 7 - success rate in the 
robustness literature. Let S(X,n) be a set of n elements randomly uniformly sampled 
from B(x, €). Then constraint accuracy is defined as: 


Definition 7 (Constraint accuracy). 


CAcc(¥ = i 5- 5 Ip(x) 


REX ” €5(&,n) 


Note that there is no relationship between constraint accuracy and constraint security: 
an attacker may succeed in finding an adversarial example where random sampling 
fails and vice-versa. Also note the role of sampling in this discussion and compare it 
to the discussion of the choice c3 in Sect. 2. Firstly, sampling procedures affect both 
training and evaluation of networks. But at the same time, their choice is orthogonal 


' Our naming scheme differs from [4] who use the term constraint accuracy to refer to what we 
term constraint security. In our opinion, the term constraint accuracy is less appropriate here 
than the name constraint security given the use of an adversarial attack. 
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to choosing the verification constraint for which we optimise or evaluate. For example, 
we measure constraint security with respect to the PGD attack, and this determines the 
way we sample; but having made that choice still leaves us to decide which constraint, 
SCR, SR, LR, or other we will be measuring as we sample. Constraint satisfaction is 
different from constraint security and accuracy, in that it must evaluate constraints over 
infinite domains rather than merely sampling from them. 


Choosing an Evaluation Metric. It is important to note that for all three evaluation 
metrics, one still has to make a choice for constraint P, namely SR, SCR or LR, as 
defined in Sect. 2. As constraint security always uses PGD to find input perturbations, 
the choice of SR, SCR and LR effectively amounts to us making a judgement of what 
an adversarial perturbation consists of: is it a class change as defined by SCR, or is 
it a violation of the more nuanced metrics defined by SR and LR? Therefore we will 
evaluate constraint security on the SR/SCR/LR constraints using a PGD attack. 

For large search spaces in n dimensions, random sampling deployed in constraint 
accuracy fails to find the trickier adversarial examples, and usually has deceivingly 
high performance: we found 100% and >98% constraint accuracy for SR and SCR, 
respectively. We will therefore not discuss these experiments in detail. 


4 Relative Comparison of Definitions of Robustness 


We now compare the strength of the given definitions of robustness using the intro- 
duced metrics. For empirical evaluation, we train networks on FASHION MNIST (or 
just FASHION) [19] and a modified version of the GTSRB [16] datasets consisting, 
respectively, by 28 x 28 and 48 x 48 images belonging to 10 classes. The networks 
consist of two fully connected layers: the first one having 100 neurons and ReLU as 
activation function, and the last one having 10 neurons on which we apply a clamp 
function [—100, 100], because the traditional softmax function is not compatible with 
constraint verification tools such as Marabou. Taking four different robustness proper- 
ties for which we optimise while training (Baseline, LR, SR, SCR), gives us 8 different 
networks to train, evaluate and attack. Generally, all trends we observed for the two data 
sets were the same, and we put matching graphs in [3] whenever we report a result for 
one of the data sets. Marabou [8] was used for evaluating constraint satisfaction. 


4.1 Standard and Lipschitz Robustness 


Lipschitz robustness is a strictly stronger constraint than standard robustness, in the 
sense that when a network satisfies LR(e, L) then it also satisfies SR(e, eL). However, 
the converse does not hold, as standard robustness does not relate the distances between 
the inputs and the outputs. Consequently, there are S R(e, 6) robust models that are not 
LR(e, L) robust for any L, as for any fixed L one can always make the distance |x — $| 
arbitrarily small in order to violate the Lipschitz inequality. 
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Table 1. Constraint satisfaction results for the Classification, Standard and Lipschitz constraints. 
These values are calculated over the test set and represented as %. 


FASHION net trained with: GTSRB net trained with: 
Baseline SCR SR LR Baseline SCR SR LR 
CR satisfaction | 1.5 2.0 2.0 34.0 | 0.5 1.0 3.0 4.5 
SR satisfaction | 0.5 10 65.8 100.0 | 0.0 0.0 24.0 97.0 
LR satisfaction | 0.0 0.0 0.0 0.0 | 0.0 0.0 0.0 0.0 
Empirical Significance of the Neural net trained with Constraint Loss (LR) (FASHION) 
Conclusions for Constraint = SR Attack =" SCR Attack = = LR Attack 


Security. Figure2 shows an 
empirical evaluation of this gen- 
eral result. If we train two neu- 
ral networks, one with the SR, 
and the other with the LR con- 
straint, then the latter always 
has higher constraint security 
against both SR and LR attacks Epsilon 
than the former. It also con- 
firms that generally, stronger 
constraints are harder to obtain: 
whether a network is trained 
with SR or LR constraints, it is 
less robust against an LR attack 
than against any other attack. 


Constraint Security 
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Neural net trained with Constraint Loss (SR) (FASHION) 
= SR Attack =" SCR Attack == — LR Attack 
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Constraint Security 


Empirical Significance of the 5 


Conclusions for Constraint Sat- 0,000 0,025 0,050 0,075 0,100 
isfaction. Table 1 shows that LR Epsilon 

is very difficult to guarantee as 

a verification property, indeed Fig.2. Experiments that show how the two networks 
none of our networks satisfied trained with LR and SR constraints perform when evalu- 
this constraint for any image in ated against different definitions of robustness underlying 
the attack; e measures the attack strength. 


the data set. At the same time, 
networks trained with LR satisfy 
the weaker property SR, for 100% and 97% of images — a huge improvement on the 
negligible percentage of robust images for the baseline network! Therefore, knowing 
a verification property or mode of attack, one can tailor the training accordingly, and 
training with stronger constraint gives better results. 


4.2 (Strong) Classification Robustness 


Strong classification robustness is designed to over-approximate classification robust- 
ness whilst providing a logical loss function with a meaningful gradient. We work under 
the assumption that the last layer of the classification network is a softmax layer, and 
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therefore the output forms a probability distribution. When 7 > 0.5 then any network 
that satisfies SC R(e, 7) also satisfies CR(e). For 7 < 0.5 this relationship breaks down 
as the true class may be assigned a probability greater than 7 but may still not be the 
class with the highest probability. We therefore recommended that one only uses value 
of 7 > 0.5 when using strong classification robustness (for example 7 = 0.52 in [4]). 


Empirical Significance of the Con- Robustness against SR Attack (GTSRB). The different lines 


clusions for Constraint Security. show performance of different neural networks trained with: 
Because the CR constraint cannot be = Baseline «= Data Augmentation (Random Uniform) 

“tite a z = = Data Augmentation (FGSM) == = Adversarial Training 
used within a loss function, we use — — Constraint Loss (SR) 


data augmentation when training to 
emulate its effect. First, we confirm 
our assumptions about the relative 
inefficiency of using data augmen- 
tation compared to adversarial train- 
ing or training with constraints, see 
Fig. 3. Surprisingly, neural networks Robustness against SCR Attack (FASHION). The different lines 
trained with data augmentation give show performance of different neural networks trained with: 
worse results than even the baseline ' 
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Epsilon 


= Baseline »* Data Augmentation (Random Uniform) 
network. = = Data Augmentation (FGSM) === === Constraint Loss (SCR) 


As previously discussed, random 125 
uniform sampling struggles to find 2 
adversarial inputs in large search- 
ing spaces. It is logical to expect 
that using random uniform sampling 
when training will be less successful 
than training with sampling that uses 
FGSM or PGD as heuristics. Indeed, 
Fig. 3 shows this effect for data aug- 
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Fig. 3. Experiments that show how adversarial train- 

ing, training with data augmentation, and training 

i with constraint loss affect standard and classifica- 

mentation. tion robustness of networks; € measures the attack 
One may ask whether the trends strength. 


just described would be replicated 

for more complex architectures of neural networks. In particular, data augmentation 
is known to require larger networks. By replicating the results with a large, 18-layer 
convolutional network from [4] (second graph of Fig. 3), we confirm that larger net- 
works handle data augmentation better, and that data augmentation affords improved 
robustness compared to the baseline. Nevertheless, data augmentation still lags behind 
all other modes of constraint-driven training, and thus this major trend remains stable 
across network architectures. The same figure also illustrates our point about the relative 
strength of SCR compared to CR: a network trained with data augmentation (equivalent 
to CR) is more prone to SCR attacks than a network trained with the SCR constraint. 


Empirical Significance of the Conclusions for Constraint Satisfaction. Although 
Table 1 confirms that training with a stronger property (SCR) does improve the con- 
straint satisfaction of a weaker property (CR), the effect is an order of magnitude smaller 
than what we observed for LR and SR. Indeed, the table suggests that training with the 
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LR constraint gives better results for CR constraint satisfaction. This does not contra- 
dict, but does not follow from our theoretical analysis. 


4.3 Standard vs Classification Robustness 


Given that LR is stronger than SR and SCR is stronger than CR, the obvious question 
is whether there is a relationship between these two groups. In short, the answer to this 
question is no. In particular, although the two sets of definitions agree on whether a 
network is robust around images with high-confidence, they disagree over whether a 
network is robust around images with low confidence. We illustrate this with an exam- 
ple, comparing SR against CR. We note that a similar analysis holds for any pairing 
from the two groups. 

The key insight is that standard robustness bounds 
the drop in confidence that a neural network can exhibit 
after a perturbation, whereas classification robustness 
does not. Figure 4a shows two hypothetical images from 
the MNIST dataset. Our network predicts that Fig. 4a 
has an 85% chance of being a 7. Now consider adding (a) PQ) = (b) PT) = 
a small perturbation to the image and consider two dif- 
ferent scenarios. In the first scenario the output of the Fig, 4. Images from the MNIST 
network for class 7 decreases from 85% to 83% and set 
therefore the classification stays the same. In the second 
scenario the output of the network for class 7 decreases from 85% to 45%, and results 
in the classification changing from 7 to 9. When considering the two definitions, a small 
change in the output leads to no change in the classification and a large change in the 
output leads to a change in classification and so robustness and classification robustness 
both agree with each other. 

However, now consider Fig.4b with relatively high uncertainty. In this case the 
network is (correctly) less sure about the image, only narrowly deciding that it’s a 7. 
Again consider adding a small perturbation. In the first scenario the prediction of the 
network changes dramatically with the probability of it being a 7 increasing from 51% 
to 91% but leaves the classification unchanged as 7. In the second scenario the output 
of the network only changes very slightly, decreasing from 51% to 49% flipping the 
classification from 7 to 9. Now, the definitions of SR and CR disagree. In the first 
case, adding a small amount of noise has erroneously massively increased the network’s 
confidence and therefore the SR definition correctly identifies that this is a problem. In 
contrast CR has no problem with this massive increase in confidence as the chosen 
output class remains unchanged. Thus, SR and CR agree on low-uncertainty examples, 
but CR breaks down and gives what we argue are both false positives and false negatives 
when considering examples with high-uncertainty. 


Empirical Significance of the Conclusions for Constraint Security. Our empirical 
study confirms these general conclusions. Figure 2 shows that depending on the prop- 
erties of the dataset, SR may not guarantee SCR. The results in Fig. 5 tell us that using 
the SCR constraint for training does not help to increase defences against SR attacks. 
A similar picture, but in reverse, can be seen when we optimize for SR but attack with 
SCR. Table | confirms these trends for constraint satisfaction. 
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5 Other Properties of Robustness Definitions 


Table 2. A comparison of the different types of robustness studied in this paper. Top half: general 
properties. Bottom half: relation to existing machine-learning literature 


Definition Standard Lipschitz Classification Strong class. 
robustness robustness robustness robustness 

Problem domain General General Classification Classification 
Interpretability Medium Low High Medium 
Globally desirable y v x x 

Has loss functions y v xX y 
Adversarial training v x x x 

Data augmentation x x y X 
Logical-constraint training [4] v v x y 


We finish with a sum- Robustness against SR Attack (GTSRB). The different lines 


mary of further inter- show performance of different neural networks trained with: 
esting properties of the = Baseline =——= = Constraint Loss (SR) == -= Constraint Loss (SCR) 
four robustness defini- ** Constraint Loss (LR) 

tions. Table2 shows a 125 


100 
75 
50 
25 


summary of all compari- 
son measures considered 
in the paper. 


Constraint Security 


Dataset assumptions con- 
cern the distribution of 
the training data with 
respect to the data man- 
ifold of the true distribu- 
tion of inputs, and influ- 
ence evaluation of robustness. For SR and LR it is, at minimum, desirable for the net- 
work to be robust over the entire data manifold. In the most domains the shape of the 
manifold is unknown and therefore it is necessary to approximate it by taking the union 
of the balls around the inputs in the training dataset. We are not particularly interested 
about whether the network is robust in regions of the input space that lie off the data 
manifold, but there is no problem if the network is robust in these regions. Therefore 
these definitions make no assumptions about the distribution of the training dataset. 
This is in contrast to CR and SCR. Rather than requiring that there is only a small 
change in the output, they require that there is no change to the classification. This is 
only a desirable constraint when the region being considered does not contain a decision 
boundary. Consequently when one is training for some form of classification robustness, 
one is implicitly making the assumption that the training data points lie away from any 
decision boundaries within the manifold. In practice, most datasets for classification 
problems assign a single label instead of an entire probability distribution to each input 
point, and so this assumption is usually valid. However, for datasets that contain input 
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Fig. 5. Experiments that show how different choices of a constraint 
loss affect standard robustness of neural networks. 


Neural Network Robustness as a Verification Property 229 


points that may lie close to the decision boundaries, CR and SCR may result in a logi- 
cally inconsistent specification. 


Interpretability. One of the key selling points of training with logical constraints is 
that, by ensuring that the network obeys understandable constraints, it improves the 
explainability of the neural network. Each of the robustness constraints encode that 
“small changes to the input only result in small changes to the output’, but the inter- 
pretability of each definition is also important. 

All of the definitions share the relatively interpretable € parameter, which measures 
how large a perturbation from the input is acceptable. Despite the other drawbacks 
discussed so far, CR is inherently the most interpretable as it has no second parameter. In 
contrast, SR and SCR require extra parameters, ô and 77 respectively, which measure the 
allowable deviation in the output. Their addition makes these models less interpretable. 

Finally we argue that, although LR is the most desirable constraint, it is also the 
least interpretable. Its second parameter L measures the allowable change in the out- 
put as a proportion of the allowable change in the input. It therefore requires one to 
not only have an interpretation of distance for both the input and output spaces, but to 
be able to relate them. In most domains, this relationship simply does not exist. Con- 
sider the MNIST dataset, both the commonly used notion of pixel-wise distance used 
in the input set, although crude, and the distance between the output distributions are 
both interpretable. However, the relationship between them is not. For example, what 
does allowing the distance between the output probability distributions being no more 
than twice the distance between the images actually mean? This therefore highlights a 
common trade-off between complexity of the constraint and its interpretability. 


6 Conclusions 


These case studies have demonstrated the importance of emancipating the study of 
desirable properties of neural networks from a concrete training method, and study- 
ing these properties in an abstract mathematical way. For example, we have discovered 
that some robustness properties can be ordered by logical strength and some are incom- 
parable. Where ordering is possible, training for a stronger property helps in verifying 
a weaker property. Some of the stronger properties, such as Lipschitz robustness, are 
not yet feasible for the modern DNN solvers, such as Marabou [8]. Moreover, we show 
that the logical strength of the property may not guarantee other desirable properties, 
such as interpretability. Some of these findings lead to very concrete recommendations, 
e.g.: it is best to avoid CR and SCR as they may lead to inconsistencies; when using LR 
and SR, one should use stronger property (LR) for training in order to be successful in 
verifying a weaker one (SR). In other cases, the distinctions that we make do not give 
direct prescriptions, but merely discuss the design choices and trade-offs. 

This paper also shows that constraint security, a measure intermediate between con- 
straint accuracy and constraint satisfaction, is a useful tool in the context of tuning the 
continuous verification loop. It is more efficient to measure and can show more nuanced 
trends than constraint satisfaction. It can be used to tune training parameters and build 
hypotheses which we ultimately confirm with constraint satisfaction. 
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We hope that this study will contribute towards establishing a solid methodology 
for continuous verification, by setting up some common principles to unite verification 
and machine learning approaches to DNN robustness. 


Acknowledgement. Authors acknowledge support of EPSRC grant AISEC EP/T026952/1 and 
NCSC grant Neural Network Verification: in search of the missing spec. 
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Abstract. We present LT-PDR, a lattice-theoretic generalization of 
Bradley’s property directed reachability analysis (PDR) algorithm. LT- 
PDR identifies the essence of PDR to be an ingenious combination of veri- 
fication and refutation attempts based on the Knaster—Tarski and Kleene 
theorems. We introduce four concrete instances of LT-PDR, derive their 
implementation from a generic Haskell implementation of LT-PDR, and 
experimentally evaluate them. We also present a categorical structural 
theory that derives these instances. 


Keywords: Property directed reachability analysis - Model checking - 
Lattice theory - Fixed point theory - Category theory 


1 Introduction 


Property directed reachability (PDR) (also called I[C'3) introduced in [9,13] is a 
model checking algorithm for proving/disproving safety problems. It has been 
successfully applied to software and hardware model checking, and later it has 
been extended in several directions, including foPDR [25,26] that uses both 
forward and backward predicate transformers and PrIC3 [6] for the quantitative 
safety problem for probabilistic systems. See [14] for a concise overview. 

The original PDR assumes that systems are given by binary predicates repre- 
senting transition relations. The PDR algorithm maintains data structures called 
frames and proof obligations—these are collections of predicates over states—and 
updates them. While this logic-based description immediately yields automated 
tools using SAT/SMT solvers, it limits target systems to qualitative and nonde- 
terministic ones. This limitation was first overcome by PrIC3 [6] whose target is 
probabilistic systems. This suggests room for further generalization of PDR. 
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In this paper, we propose the first lattice theory-based generalization of the 
PDR algorithm; we call it LT-PDR. This makes the PDR algorithm apply to a 
wider class of safety problems, including qualitative and quantitative. We also 
derive a new concrete extension of PDR, namely one for Markov reward models. 

We implemented the general algorithm LT-PDR in Haskell, in a way that 
maintains the theoretical abstraction and clarity. Deriving concrete instances 
for various types of systems is easy (for Kripke structures, probabilistic systems, 
etc.). We conducted an experimental evaluation, which shows that these easily- 
obtained instances have at least reasonable performance. 


Preview of the Theoretical Contribution. We generalize the PDR algo- 
rithm so that it operates over an arbitrary complete lattice L. This generaliza- 
tion recasts the PDR algorithm to solve a general problem uF <? a of over- 
approximating the least fixed point of an w-continuous function F: L — L by a 
safety property a. This lattice-theoretic generalization signifies the relationship 
between the PDR algorithm and the theory of fixed points. This also allows us 
to incorporate quantitative predicates suited for probabilistic verification. 

More specifically, we reconstruct the original PDR algorithm as a combina- 
tion of two constituent parts. They are called positive LT-PDR and negative 
LT-PDR. Positive LT-PDR comes from a witness-based proof method by the 
Knaster—Tarski fixed point theorem, and aims to verify uF <? a. In contrast, 
negative LT-PDR comes from the Kleene fixed point theorem and aims to refute 
uF <? a. The two algorithms build up witnesses in an iterative and nondeter- 
ministic manner, where nondeterminism accommodates guesses and heuristics. 
We identify the essence of PDR to be an ingenious combination of these two 
algorithms, in which intermediate results on one side (positive or negative) give 
informed guesses on the other side. This is how we formulate LT-PDR in Sect. 3.3. 

We discuss several instances of our general theory of PDR. We discuss three 
concrete settings: Kripke structures (where we obtain two instances of LT-PDR), 
Markov decision processes (MDPs), and Markov reward models. The two in the 
first setting essentially subsume many existing PDR algorithms, such as the 
original PDR [9,13] and Reverse PDR [25,26], and the one for MDPs resembles 
PrIC3 [6]. The last one (Markov reward models) is a new algorithm that fully 
exploits the generality of our framework. 

In fact, there is another dimension of theoretical generalization: the deriva- 
tion of the above concrete instances follows a structural theory of state-based 
dynamics and predicate transformers. We formulate the structural theory in the 
language of category theory [3,23]—using especially coalgebras [17] and fibra- 
tions [18]—following works such as [8,15,21,28]. The structural theory tells us 
which safety problems arise under what conditions; it can therefore suggest that 
certain safety problems are unlikely to be formulatable, too. The structural the- 
ory is important because it builds a mathematical order in the PDR literature, 
in which theoretical developments tend to be closely tied to implementation and 
thus theoretical essences are often not very explicit. For example, the theory is 
useful in classifying a plethora of PDR-like algorithms for Kripke structures (the 
original, Reverse PDR, fbPDR, etc.). See Sect. 5.1. 
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We present the above structural theory in Sect. 4 and briefly discuss its use in 
the derivation of concrete instances in Sect.5. We note, however, that this cate- 
gorical theory is not needed for reading and using the other parts of the paper. 

There are other works on generalization of PDR [16,24], but our identification 
of the interplay of Knaster—Tarski and Kleene is new. They do not accommodate 
probabilistic verification, either. See [22, Appendix A] for further discussions. 


Preliminaries. Let (L, <) bea poset. (L, <)°P denotes the opposite poset (L, > 
). Note that if (L, <) is a complete lattice then so is (L, <)°P. An w-chain (resp. 
w°P-chain) in L is an N-indexed family of increasing (resp. decreasing) elements 
in L. A monotone function F : L > L is w-continuous (resp. w°?-continuous) if 
F preserves existing suprema of w-chains (resp. infima of w°P-chains). 


2 Fixed-points in Complete Lattices 


Let (L, <) be a complete lattice and F : L — L be a monotone function. When 
we analyze fixed points of F, pre/postfixed points play important roles. 


Definition 2.1. A prefixed point of F is an element x € L satisfying Fx < a. 
A postfixed point of F is an element x € L satisfying x < Fx. We write Pre(F) 
and Post(F) for the set of prefixed points and postfixed points of F, respectively. 


The following results are central in fixed point theory. They allow us to 
under /over-approximate the least /greatest fixed points. 


Theorem 2.2. A monotone endofunction F on a complete lattice (L,<) has 
the least fixed point uF and the greatest fixed point vF. Moreover, 


1. (Knaster-Tarski [30]) The set of fixed points forms a complete lattice. Fur- 
thermore, F = Mx € L| Fx < x} and vF = V{x € L| x < Fr}. 

2. (Kleene, see e.g. [5]) If F is w-continuous, uF =\ en F” L. Dually, if F is 
wP-continuous, VF = Napen FT. 


nen 


nen 


Theorem 2.2.2 is known to hold for arbitrary w-cpos (complete lattices are 
their special case). A generalization of Theorem 2.2.2 is the Cousot—Cousot char- 
acterization [11], where F is assumed to be monotone (but not necessarily w- 
continuous) and we have uF = FL for a sufficiently large, possibly transfinite, 
ordinal «. In this paper, for the algorithmic study of PDR, we assume the w- 
continuity of F. Note that w-continuous F on a complete lattice is necessarily 
monotone. 

We call the w-chain L < FL < --- the initial chain of F and the w°?-chain 
T>FT>.--- the final chain of F. These appear in Theorem 2.2.2. 

Theorem 2.2.1 and 2.2.2 yield the following witness notions for proving and 
disproving uF < a, respectively. 


Corollary 2.3. Let (L,<) be a complete lattice and F : L — L be w-continuous. 


1. (KT) uF <a if and only if there is x € L such that Fr <a <a. 
2. (Kleene) uF £ a if and only if there is n E€ N and x € L such that x < F”L 
and x La. 
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By Corollary 2.3.1, proving uF < a is reduced to searching for x € L such 
that Fa < x < a. We call such x a KT (positive) witness. In contrast, by 
Corollary 2.3.2, disproving uF < a is reduced to searching for n € N and z € L 
such that x < F"L and z £ a. We call such x a Kleene (negative) witness. 


Notation 2.4. We shall use lowercase (Roman and Greek) letters for elements 
of L (such as a,x € L), and uppercase letters for (finite or infinite) sequences of 
L (such as X € L* or L”). The i-th (or (i — j)-th when subscripts are started 
from j) element of a sequence X is designated by a subscript: X; € L. 


3 Lattice-Theoretic Reconstruction of PDR 


Towards the LT-PDR algorithm, we first introduce two simpler algorithms, called 
positive LT-PDR (Sect.3.1) and negative LT-PDR (Sect.3.2). The target prob- 
lem of the LT-PDR algorithm is the following: 


Definition 3.1 (the LFP-OA problem pF <? a). Let L be a complete lat- 
tice, F : L — L be w-continuous, and a € L. The lfp over-approximation 
(LFP-OA) problem asks if uF < a holds; the problem is denoted by pF <° a. 


Example 3.2. Consider a transition system, where S be the set of states, 1 C S$ 
be the set of initial states, 6: S — PS be the transition relation, and a C S be 
the set of safe states. Then letting L := PS and F := LUU se(—) (s), the lfp over- 
approximation problem uF <? a is the problem whether all reachable states are 
safe. It is equal to the problem studied by the conventional IC3/PDR [9, 13]. 


Positive LT-PDR iteratively builds a KT witness in a bottom-up manner 
that positively answers the LFP-OA problem, while negative LT-PDR iteratively 
builds a Kleene witness for the same LFP-OA problem. We shall present these 
two algorithms as clear reflections of two proof principles (Corollary 2.3), each 
of which comes from the fundamental Knaster—Tarski and Kleene theorems. 

The two algorithms build up witnesses in an iterative and nondeterministic 
manner. The nondeterminism is there for accommodating guesses and heuristics. 
We identify the essence of PDR to be an ingenious combination of these two 
algorithms, in which intermediate results on one side (positive or negative) give 
informed guesses on the other side. This way, each of the positive and negative 
algorithms provides heuristics in resolving the nondeterminism in the execution 
of the other. This is how we formulate the LT-PDR algorithm in Sect. 3.3. 

The dual of LFP-OA problem is called the gfp-under-approximation problem 
(GFP-UA): the GFP-UA problem for a complete lattice L, an w°?-continuous 
function F : L — L and a € L is whether the inequality a < vF holds or 
not, and is denoted by a <? vF. It is evident that the GFP-UA problem for 
(L, F, a) is equivalent to the LFP-OA problem for (L°?, F, a). This suggests the 
dual algorithm called LT-OpPDR for GFP-UA problem. See Remark 3.24 later. 
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3.1 Positive LT-PDR: Sequential Positive Witnesses 


We introduce the notion of KT” witness—a KT witness (Corollary 2.3) con- 
structed in a sequential manner. Positive LT-PDR searches for a KT” witness 
by growing its finitary approximations (called KT sequences). 

Let L be a complete lattice. We regard each element x € L as an abstract 
presentation of a predicate on states. The inequality « < y means that the 
predicate x is stronger than the predicate y. We introduce the complete lattice 
[n, L] of increasing chains of length n € N, whose elements are (Xo < +++ < Xn_1) 
in L equipped with the element-wise order. We similarly introduce the complete 
lattice [w, L] of w-chains in L. We lift F : L > L to FË : |w, L] — [w, L] and 
F# :[n, L] > [n, L] (for n > 2) as follows. Note that the entries are shifted. 


F# (Xo < Xi <- ) = (L< FX) < FX <:::) 
FR (Xo < © < Xp) = (L < FXo S++) < PXp_2) 


(1) 
Definition 3.3 (KT witness). Let L,F,a be as in Definition 3.1. Define 
Aa :=(a<a<---). A KT” witness is X € |w, L] such that F#X < X < Aa. 


Theorem 3.4. Let L,F,a be as in Definition 3.1. There exists a KT witness 
(Corollary 2.3) if and only if there exists a KT” witness. 


Concretely, a KT witness x yields a KT” witness x < x < ---;a KT” witness 
X yields a KT witness V„ew Xn. A full proof (via Galois connections) is in [22]. 

The initial chain L < FL <--- is always a KT® witness for uF < a. There 
are other KT” witnesses whose growth is accelerated by some heuristic guesses— 
an extreme example is x < x < --- with a KT witness x. KT” witnesses embrace 
the spectrum of such different sequential witnesses for uF < a, those which mix 
routine constructions (i.e. application of F) and heuristic guesses. 


Definition 3.5 (KT sequence). Let L,F,a be as in Definition 3.1. A KT 
sequence for uF <? a is a finite chain (Xo < +++ < Xn_-1), for n > 2, satisfying 


1. Xn—2 <a; and 
2. X is a prefixed point of F#, that is, FX; < Xis1 for each i € [0,n — 2]. 


A KT sequence (Xo < +--+ < Xn-1) is conclusive if Xj41 < X; for some j. 


KT sequences are finite by definition. Note that the upper bound a is imposed on 
all X; but X,_,. This freedom in the choice of X,_; offers room for heuristics, 
one that is exploited in the combination with negative LT-PDR (Sect. 3.3). 

We take KT sequences as finite approximations of KT” witnesses. This view 
shall be justified by the partial order (<) between KT sequences defined below. 


Definition 3.6 (order < between KT sequences). We define a partial order 
relation < on KT sequences as follows: (Xo,...,Xn—1) < (XO,---,;Xj,_1) if 
n<m and Xj > X; for eachO<j<n-1. 


The order Xj > X; represents that Xj is a stronger predicate (on states) 
than X,;. Therefore X < X’ expresses that X’ is a longer and stronger/more 
determined chain than X. We obtain KT“ witnesses as their w-superma. 
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Theorem 3.7. Let L,F,a be as in Definition 3.1. The set of KT sequences, 
augmented with the set of KT” witnesses {X € [w,L] | F#X < X < Aa} 
and ordered by the natural extension of <, is an w-cpo. In this w-cpo, each 
KT” witness X is represented as the suprema of an w-chain of KT sequences, 
namely X = Vn>»2 X|n where X|n € [n, L] is the length n prefix of X. 


Proposition 3.8. Let L, F,a be as in Definition 3.1. There exists a KT” wit- 
ness if and only if there exists a conclusive KT sequence. 


Proof. (=): If there exists a KT” witness, wf < a holds by Corollary 2.3 and 
Theorem 3.4. Therefore, the “informed guess” (uF < uF) gives a conclusive 
KT sequence. (<=): When X is a conclusive KT sequence with X; = Xj4+1, 
Xo <+ < Xj = Xj41 =- is a KT” witness. 


The proposition above yields the following partial algorithm that aims to answer 
positively to the LFP-OA problem. It searches for a conclusive KT sequence. 


Definition 3.9 (positive LT-PDR). Let L, F,a be as in Definition 3.1. Pos- 
itive LT-PDR is the algorithm shown in Algorithm 1, which says ‘True’ to the 
LFP-OA problem uF <° a if successful. 


The rules are designed by the following principles. 

Valid is applied when the current X is conclusive. 

Unfold extends X with T. In fact, we can use any element x satisfying 
Xy-1 < xz and FXņn-1 < z in place of T (by the application of Induction with 
x). The condition X„—1ı < a is checked to ensure that the extended X satisfies 
the condition in Definition 3.5.1. 

Induction strengthens X, replacing the j-th element with its meet with x. 
The first condition Xy £ x ensures that this rule indeed strengthens X, and the 
second condition F'(X,-1 Ax) < x ensures that the strengthened X satisfies the 
condition in Definition 3.5.2, that is, F#X < X (see the proof in [22]). 


Theorem 3.10. Let L,F,a be as in Definition 3.1. Then positive LT-PDR is 
sound, i.e. if it outputs ‘True’ then uF < a holds. 

Moreover, assume uF < a is true. Then positive LT-PDR is weakly termi- 
nating (meaning that suitable choices of x when applying Induction make the 
algorithm terminate). 


The last “optimistic termination” is realized by the informed guess uF as x 
in Induction. To guarantee the termination of LT-PDR, it suffices to assume 
that the complete lattice L is well-founded (no infinite decreasing chain exists in 
L) and there is no strictly increasing w-chain under a in L, although we cannot 
hope for this assumption in every instance (Sect. 5.2, 5.3). 


Lemma 3.11. Let L,F,a be as in Definition 3.1. If uF < a, then for any KT 
sequence X, at least one of the three rules in Algorithm 1 is enabled. 

Moreover, for any KT sequence X, let X' be obtained by applying either 
Unfold or Induction. Then X < X' and X # X'. 
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Input : An instance (uF <° a) of the LFP-OA problem in L 
Output : ‘True’ with a conclusive KT sequence 
Data: a KT sequence X = (Xo < --- < Xn-1) 
Initially: X := (L < FL) 
repeat (do one of the following) 
Valid If Xj+ı < X; for some j < n — 1, return ‘True’ with the conclusive 
KT sequence X. 
Unfold If Xn-1 <a, let X := (Xo < +--+ < Xn-1 < T). T 
Induction If some k > 2 and x € L satisfy X, £ xz and F(Xk-1 A £) <2, 
let X := X[X; = Xj A z]2<j<k- 
until any return value is obtained; 


Algorithm 1: positive LT-PDR 


Input : An instance (uF <° a) of the LFP-OA problem in L 
Output : ‘False’ with a conclusive Kleene sequence 
Data: a Kleene sequence C = (Co,..., Cn—1) 
Initially: C := () 
repeat (do one of the following) 
Candidate Choose x € L such that x £ a, and let C := (x). 
Model If Co = L, return ‘False’ with the conclusive Kleene sequence C. 
Decide If there exists x such that Co < Fa, then let C := (#,Co,...,Cn—1). 
until any return value is obtained; 


Algorithm 2: negative LT-PDR 


Input : An instance (uF <° a) of the LFP-OA problem in L 
Output : ‘True’ with a conclusive KT sequence, or ‘False’ with a conclusive 
Kleene sequence 
Data: (X; C) where X is a KT sequence (Xo < --- < Xn-1), and C is a Kleene 
sequence (Ci, Ci4i1,.--,;Cn—1) (C is empty if n = i). 

Initially: (X;C) := (L < FL; ()) 

repeat (do one of the following) 

Valid If Xj+ı < X; for some j < n — 1, return ‘True’ with the conclusive 
KT sequence X. 

Unfold If Xn—ı < a, let (X; C) = (Xo < ++: < Xn-1 < T;()). 

Induction If some k > 2 and x € L satisfy X, £ xz and F(Xķp—-1 A £) <2, 
let (X; C) := (X[X; = Xj A z]2<j<k; C). 

Candidate If C = () and Xn-1 £ a, choose x € L such that xz < Xn—-ı and 
x £ a, and let (X; C) := (X; (x)). 

Model If C; is defined, return ‘False’ with the conclusive Kleene sequence 
(Oiee Orai: 

Decide If C; < FX;i—-1, choose x € L satisfying x < X;-1 and C; < Fx, and 
let (X; C) := (X;(a,Ci,...,Cn-1)). 

Conflict If C; Z FXi-1, choose x € L satisfying C; £ x and 
F(Xi-1 Ax) < x, and let 
(X; C) = (X[X; := X; A g]2<j<i; (Citi, ,Cn-1)). 

until any return value is obtained; 


Algorithm 3: LT-PDR 
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Theorem 3.12. Let L,F,a be as in Definition 3.1. Assume that < in L is 
well-founded and uF < a. Then, any non-terminating run of positive LT-PDR 
converges to a KT” witness (meaning that it gives a KT” witness in w-steps). 
Moreover, if there is no strictly increasing w-chain bounded by a in L, then 
positive LT-PDR is strongly terminating. 


3.2 Negative PDR: Sequential Negative Witnesses 


We next introduce Kleene sequences as a lattice-theoretic counterpart of proof 
obligations in the standard PDR. Kleene sequences represent a chain of sufficient 
conditions to conclude that certain unsafe states are reachable. 

Definition 3.13 (Kleene sequence). Let L,F,a be as in Definition 3.1. 
A Kleene sequence for the LFP-OA problem pF <? a is a finite sequence 
(Co,---,;Cn—1), Jor n >0 (C is empty ifn = 0), satisfying 

1. Cj < FCj-1 for eachl<j<n-1; 

2. Ch-1 £ Q. 

A Kleene sequence (Co,...,Cn—1) is conclusive if Co = L. We may use i (0 < 
i < n) instead of 0 as the starting index of the Kleene sequence C. 

When we have a Kleene sequence C = (Co,...,Cn—1), the chain of implications 
(Cj < FIL) = > (Cy4i < FTL) hold for 0 < j < n — 1. Therefore when C is 
conclusive, C;,_1 is a Kleene witness (Corollary 2.3.2). 

Proposition 3.14. Let L,F,a be as in Definition 3.1. There exists a Kleene 
(negative) witness if and only if there exists a conclusive Kleene sequence. 
Proof. (=): If there exists a Kleene witness x such that x < F"L and x <a, 
(L,FL,...,F"1) is a conclusive Kleene sequence. (<=): Assume there exists a 


conclusive Kleene sequence C. Then Ch—1 satisfies C,_1 < F"~!L and Cy_1 É 
a because of Cp—1 < FOn-2 < ++: < F10 = F"—1L and Definition 3.13.2. 


This proposition suggests the following algorithm to negatively answer to the 
LFP-OA problem. It searches for a conclusive Kleene sequence. The algorithm 
updates a Kleene sequence until its first component becomes L. 


Definition 3.15 (negative LT-PDR). Let L,F,a be as in Definition 3.1. 
Negative LT-PDR is the algorithm shown in Algorithm 2, which says ‘False’ 
to the LFP-OA problem uF <° a if successful. 


The rules are designed by the following principles. 
Candidate initializes C with only one element x. The element x has to be 
chosen such that x £ a to ensure Definition 3.13.2. 
Model is applied when the current Kleene sequence C is conclusive. 
Decide prepends x to C. The condition Co < Fx ensures Definition 3.13.1. 


Theorem 3.16. Let L, F,a be as in Definition 3.1. 

1. Negative LT-PDR is sound, i.e. if it outputs ‘False’ then uF £ a. 

2. Assume uF £ a is true. Then negative LT-PDR is weakly terminating (mean- 
ing that suitable choices of x when applying rules Candidate and Decide 
make the algorithm terminate). 
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3.3 LT-PDR: Integrating Positive and Negative 


We have introduced two simple PDR algorithms, called positive LT-PDR 
(Sect.3.1) and negative LT-PDR (Sect.3.2). They are so simple that they have 
potential inefficiencies. Specifically, in positive LT-PDR, it is unclear that how 
we choose x € L in Induction, while in negative LT-PDR, it may easily diverge 
because the rules Candidate and Decide may choose x € L that would not 
lead to a conclusive Kleene sequence. We resolve these inefficiencies by combin- 
ing positive LT-PDR and negative LT-PDR. The combined PDR algorithm is 
called LT-PDR, and it is a lattice-theoretic generalization of conventional PDR. 

Note that negative LT-PDR is only weakly terminating. Even worse, it is 
easy to make it diverge—after a choice of x in Candidate or Decide such that 
x & uF, no continued execution of the algorithm can lead to a conclusive Kleene 
sequence. For deciding uF <° a efficiently, therefore, it is crucial to detect such 
useless Kleene sequences. 

The core fact that underlies the efficiency of PDR is the following proposition, 
which says that a KT sequence (in positive LT-PDR) can quickly tell that a 
Kleene sequence (in negative LT-PDR) is useless. This fact is crucially used for 
many rules in LT-PDR (Definition 3.20). 


Proposition 3.17. Let C = (Cj,...,Cn_1) be a Kleene sequence (2 < n,0 < 
i < n-— 1) and X = (Xo < --- < Xn_-1) be a KT sequence. Then 


1. Ci £ X; implies that C cannot be extended to a conclusive one, that is, there 

does not exist Co, ...,Ci—1 such that (Co, ...,Cn—1) is conclusive. 
2. Ci £ FXi—1ı implies that C cannot be extended to a conclusive one. 
3. There is no conclusive Kleene sequence with length n — 1. 


The proof relies on the following lemmas. 


Lemma 3.18. Any KT sequence (Xo < ++: < Xn—-1ı) over-approximates the 
initial sequence: F’ 1L < X; holds for any i such that0 <i<n—1. 


Lemma 3.19. Let C = (Ci,...,Cn—1) be a Kleene sequence (0 < i < n— 1) 
and (Xo < --- < Xn—1) be a KT sequence. The following satisfy 1 & 2 > 3. 


1. The Kleene sequence C can be extended to a conclusive one. 
2. Ci < F'L. 
3. Ci < F’ Xij for each j with O< j <i. 


Using the above lattice-theoretic properties, we combine positive and nega- 
tive LT-PDRs into the following LT-PDR algorithm. It is also a lattice-theoretic 
generalization of the original PDR algorithm. The combination exploits the 
mutual relationship between KT sequences and Kleene sequences, exhibited as 
Proposition 3.17, for narrowing down choices in positive and negative LT-PDRs. 


Definition 3.20 (LT-PDR). Given a complete lattice L, an w-continuous 
function F : L — L, and an element a € L, LT-PDR is the algorithm shown in 
Algorithm 3 for the LFP-OA problem uF <° a. 
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The rules are designed by the following principles. 

(Valid, Unfold, and Induction): These rules are almost the same as in 
positive LT-PDR. In Unfold, we reset the Kleene sequence because of Propo- 
sition 3.17.3. Occurrences of Unfold punctuate an execution of the algorithm: 
between two occurrences of Unfold, a main goal (towards a negative conclusion) 
is to construct a conclusive Kleene sequence with the same length as the X. 

(Candidate, Model, and Decide): These rules have many similarities to 
those in negative LT-PDR. Differences are as follows: the Candidate and 
Decide rules impose x < X; on the new element « in (£, Ci41,...,Cn—1) because 
Proposition 3.17.1 tells us that other choices are useless. In Model, we only need 
to check whether C4 is defined instead of Co = L. Indeed, since C; is added in 
Candidate or Decide, Ci < Xı = FL always holds. Therefore, 2 > 1 in 
Lemma 3.19 shows that (L,C1,...,Cn—1) is conclusive. 

(Conflict): This new rule emerges from the combination of positive and neg- 
ative LT-PDRs. This rule is applied when C; £ F.X;_1, which confirms that the 
current C cannot be extended to a conclusive one (Proposition 3.17.2). There- 
fore, we eliminate C; from C and strengthen X so that we cannot choose C; 
again, that is, so that C; £ (X; A). Let us explain how X is strengthened. The 
element x has to be chosen so that C; £ x and F(X;_1 A x) < x. The former 
dis-inequality ensures the strengthened X satisfies C; £ (X; A x), and the latter 
inequality implies F(X;-1 A x) < x. One can see that Conflict is Induction 
with additional condition C; £ x, which enhances so that the search space for x 
is narrowed down using the Kleene sequence C. 

Canonical choices of x € L in Candidate, Decide, and Conflict are x := 
Xn-1, © := X;j_1, and x := F'X;_1, respectively. However, there can be cleverer 
choices; e.g. x := S \ (C; \ FXj-1) in Conflict when L = PS. 


Lemma 3.21. Each rule of LT-PDR, when applied to a pair of a KT and a 
Kleene sequence, yields a pair of a KT and a Kleene sequence. 


Theorem 3.22 (correctness). LT-PDR is sound, i.e. if it outputs ‘True’ then 
LF <a holds, and if it outputs ‘False’ then uF £ a holds. 


Many existing PDR algorithms ensure termination if the state space is finite. 
A general principle behind is stated below. Note that it rarely applies to infinitary 
or quantitative settings, where we would need some abstraction for termination. 


Proposition 3.23 (termination). LT-PDR terminates regardless of the order 
of the rule-applications if the following conditions are satisfied. 


1. Valid and Model rules are immediately applied if applicable. 

2. (L,<) is well-founded. 

3. Hither of the following is satisfied: a) uF < a and (L,<) has no strictly 
increasing w-chain bounded by a, or b) F La. 


Cond 1 is natural: it just requires LT-PDR to immediately conclude ‘True’ or 
‘False’ if it can. Cond. 2-3 are always satisfied when L is finite. 
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Table 1. Categorical modeling of state-based dynamics and predicate transformers 


a transition system as a coalgebra [17] in the base category B of sets and functions 


objects X,Y,... in B sets (in our examples where B = Set) 
an arrow f: X — Y in B a function (in our examples where B = Set) 
a transition type 
a functor G: B > B G = P for Kripke structures (§5.1), 
G = (D(—) + 1)** for MDPs (§5.2), etc. 


a coalgebra 6: S + GS in B [17] a transition system (Kripke structure, MDP, etc.) 


a fibration p: E — B [18] that equips sets in B with predicates 


the fiber category Es over S in B the lattice of predicates over a set S 
the pullback functor l*: Ey + Ex substitution P(y) +> P(I(x)) in 
for l: X — Y in B predicates P € Ey over Y 
logical interpretation of the transition type G 
(specifies e.g. the may vs. must modalities) 


a lifting G: E — E of G along p 


the predicate transformer, whose fixed points are of our interest 


the predicate transformer associated with 
the transition system 6 


the composite 6*G: Es > Es 


Theorem 3.22 and Proposition 3.23 still hold if Induction rule is dropped. 
However, the rule can accelerate the convergence of KT sequences and improve 
efficiency. 


Remark 3.24 (LT-OpPDR). The GFP-UA problem a <’? vF is the dual of LFP- 
OA, obtained by opposing the order < in L. We can also dualize the LT-PDR 
algorithm (Algorithm 3), obtaining what we call the LT-OpPDR algorithm for 
GFP-UA. Moreover, we can express LT-OpPDR as LT-PDR if a suitable invo- 
lution =: L — L is present. See [22, Appendix B] for further details; see also 
Proposition 4.3. 


4 Structural Theory of PDR by Category Theory 


Before we discuss concrete instances of LT-PDR in Sect. 5, we develop a struc- 
tural theory of transition systems and predicate transformers as a basis of LT- 
PDR. The theory is formulated in the language of category theory [3,17,18, 23]. 
We use category theory because 1) categorical modeling of relevant notions is 
well established in the community (see e.g. [2,8,17,18,27]), and 2) it gives us the 
right level of abstraction that accommodates a variety of instances. In particular, 
qualitative and quantitative settings are described in a uniform manner. 

Our structural theory (Sect. 4) serves as a backend, not a frontend. That is, 


— the theory in Sect.4 is important in that it explains how the instances in 
Sect.5 arise and why others do not, but 

— the instances in Sect.5 are described in non-categorical terms, so readers 
who skipped Sect. 4 will have no difficulties following Sect.5 and using those 
instances. 
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4.1 Categorical Modeling of Dynamics and Predicate Transformers 


Our interests are in instances of the LFP-OA problem uF <? a (Definition 3.1) 
that appear in model checking. In this context, 1) the underlying lattice L is 
that of predicates over a state space, and 2) the function F: L — L arises from 
the dynamic/transition structure, specifically as a predicate transformer. The 
categorical notions in Table 1 model these ideas (state-based dynamics, predicate 
transformers). This modeling is well-established in the community. 

Our introduction of Table1 here is minimal, due to the limited space. See 
[22, Appendix C] and the references therein for more details. 

A category consists of objects and arrows between them. In Table 1, categories 
occur twice: 1) a base category B where objects are typically sets and arrows are 
typically functions; and 2) fiber categories Eg, defined for each object S of B, 
that are identified with the lattices of predicates. Specifically, objects P,Q,... 
of Eg are predicates over S, and an arrow P — Q represents logical implication. 
A general fact behind the last is that every preorder is a category—see e.g. [3]. 


Transition Systems as Coalgebras. State-based transition systems are mod- 
eled as coalgebras in the base category B [17]. We use a functor G: B > B to 
represent a transition type. A G-coalgebra is an arrow 6: S — G'S, where S is a 
state space and 6 describes the dynamics. For example, a Kripke structure can 
be identified with a pair (S,6) of a set S and a function 6: S — PS, where PS 
denotes the powerset. The powerset construction P is known to be a functor 
P: Set — Set; therefore Kripke structures are P-coalgebras. For other choices 
of G, G-coalgebras become different types of transition systems, such as MDPs 
(Sect. 5.2) and Markov Reward Models (Sect. 5.3). 


Predicates Form a Fibration. Fibrations are powerful categorical constructs 
that can model various indexed entities; see e.g. [18] for its general theory. Our 
use of them is for organizing the lattices Eg of predicates over a set S, indexed 
by the choice of S. For example, Es; = 2°—the lattice of subsets of S—for 
modeling qualitative predicates. For quantitative reasoning (e.g. for MDPs), we 
use Eg = [0,1], where [0,1] is the unit interval. This way, qualitative and 
quantitative reasonings are mathematically unified in the language of fibrations. 

A fibration is a functor p: E — B with suitable properties; it can be thought 
of as a collection (Es)sep of fiber categories Egs—indexed by objects S of B— 
suitably organized as a single category E. Notable in this organization is that 
we obtain the pullback functor l*: Ey — Ex for each arrow l: X — Y in B. In 
our examples, [* is a substitution along | in predicates—I* is the monotone map 
that carries a predicate P(y) over Y to the predicate P(I(x)) over X. 

In this paper, we restrict to a subclass of fibrations (called CLat,-fibrations) 
in which every fiber category Eg is a complete lattice, and each pullback functor 
preserves all meets. We therefore write P < Q for arrows in Eg; this represents 
logical implication, as announced above. Notice that each f* has a left adjoint 
(lower adjoint in terms of Galois connection), which exists by Freyd’s adjoint 
functor theorem. The left adjoint is denoted by fy. 
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We also consider a lifting G: E — E of G along p; it is a ao. 
functor G such that pG = Gp. See the diagram on the right. It oi 
specifies the logical interpretation of the transition type G. For P| G . 

—S 


example, for G = P (the powerset functor) from the above, two 
choices of G are for the may and must modalities. See e.g. [2, 
15,20, 21). 


Categorical Predicate Transformer. The above constructs allow us to model 
predicate transformers—F in our examples of the LFP-OA problem pF <? a— 
in categorical terms. A predicate transformer along a coalgebra 6: S — GS with 
respect to the lifting G is simply the composite Eg = UGS ca ig, where the 
first G is the restriction of G: E > E to Eg. Intuitively, 1) given a postcondition 
P in Eg, 2) it is first interpreted as the predicate GP over GS , and then 3) it is 
pulled back along the dynamics 6 to yield a precondition 6*GP. Such (backward) 
predicate transformers are fundamental in a variety of model checking problems. 


4.2 Structural Theory of PDR from Transition Systems 


We formulate a few general safety problems. We show how they are amenable 
to the LT-PDR (Definition 3.20) and LT-OpPDR (Remark 3.24) algorithms. 


Definition 4.1 (backward safety problem, BSP). Let p be a CLat,- 
fibration, 6 : S —> GS be a coalgebra in B, and G: E > E be a lifting of G 
along p such that Cy : Ex — Ecx is w°P-continuous for each X € B. The 
backward safety problem for (1 € Es,6,a € Eg) in (p,G,G) is the GFP-UA 
problem for (Es,a ^A 6*G,1), that is, 


l <? vr.a nð Ge. (2) 


Here, ı represents the initial states and a represents the safe states. The predicate 
transformer x ++ a A 6*Gax in (2) is the standard one for modeling safety— 
currently safe (a), and the next time x (6*Ga). Its gfp is the safety property; (2) 
asks if all initial states (¿) satisfy the safety property. Since the backward safety 
problem is a GFP-UA problem, we can solve it by LT-OpPDR (Remark 3.24). 


Additional assumptions allow adii LT-OpPDR, 
us to reduce the backward safety BSP ——+ GFP-UA ——+ True/False 
problem to LFP-OA problems, suitable adjoints 
which are solvable by LT-PDR, involution = TEPON 
as shown on the right. LFP-OA ———> True/False 


The first case requires the existence of the left adjoint to the predicate trans- 
former “Gs : Eg — Eg. Then we can translate BSP to the following LFP-OA 
problem. It directly asks whether all reachable states are safe. 


Proposition 4.2 (forward safety problem, FSP). In the setting of Def- 
inition 4.1, assume that each Gx : Ex — Ecx preserves all meets. Then by 
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letting Hs : Egs — Eg be the left adjoint of Gs, the BSP (2) is equivalent to 
the LFP-OA problem for (Eg, V Hsô+, a): 


uz. V Hsia <? a. (3) 


This problem is called the forward safety problem for (v,6,a) in (p,G,G). 


The second case assumes that the complete lattice Es of predicates admits 
an involution operator = : Es — EF (cf. [22, Appendix B)). 


Proposition 4.3 (inverse backward safety problem, IBSP). In the set- 
ting of Definition 4.1, assume further that there is a monotone function 7: Eg —> 
LP satisfying no~ = id. Then the backward safety problem (2) is equivalent to 
the LFP-OA problem for (Eg, (7a) V (~ 0 6*Go-),-71), that is, 


uz. (~a) V (70 56*Gonrr) <? ~. (4) 


We call (4) the inverse backward safety problem for (4, ô, a) in (p,G,Ġ). Here 
(~a) V (~o 6*Go-7(—)) is the inverse backward predicate transformer. 


When both additional assumptions are fulfilled (in Proposition 4.2 and 4.3), 
we obtain two LT-PDR algorithms to solve BSP. One can even simultaneously 
run these two algorithms—this is done in fbÞPDR [25,26]. See also Sect. 5.1. 


5 Known and New PDR Algorithms as Instances 


We present several concrete instances of our LT-PDR algorithms. The one for 
Markov reward models is new (Sect. 5.3). We also sketch how those instances can 
be systematically derived by the theory in Sect. 4; details are in [22, Appendix 
Dj. 


5.1 LT-PDRs for Kripke Structures: PDR**"and PDR'2** 


In most of the PDR literature, the target system is a Kripke structure that arises 
from a program’s operational semantics. A Kripke structure consists of a set S 
of states and a transition relation ô C S x S (here we ignore initial states and 
atomic propositions). The basic problem formulation is as follows. 


Definition 5.1 (backward safety problem (BSP) for Kripke struc- 
tures). The BSP for a Kripke structure (5,5), a set 1 € 2° of initial states, 
and a set a € 2° of safe states, is the GFP-UA problemi <? vx.aA F'x, where 
F": 25 — 25 is defined by F'(A) := {s | Vs’. ((s,8’) € 6 > 8’ € A)}. 


It is clear that the GFP in Definition 5.1 represents the set of states from which 
all reachable states are in a. Therefore the BSP is the usual safety problem. 
The above BSP is easily seen to be equivalent to the following problems. 


Proposition 5.2 (forward safety problem (FSP) for Kripke struc- 
tures). The BSP in Definition 5.1 is equivalent to the LFP-OA problem px..V 
Fx <? a, where F”: 25 — 2% is defined by F(A) :=Useats’ | (8, 8’) € bf. 
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Proposition 5.3 (inverse backward safety problem (IBSP) for Kripke 
structures). The BSP in Definition 5.1 is equivalent to the LFP-OA problem 
pa.7aV =F (=x) <? -v, where a: 25 — 2% is the complement function A => 
S\ A. 


Instances of LT-PDR. The FSP and IBSP (Propositions 5.2-5.3), being LFP- 
OA, are amenable to the LT-PDR algorithm (Definition 3.20). Thus we obtain 
two instances of LT-PDR; we call them PDR*~*" and PDR!®-*". PDR'®-¥t 
is a step-by-step dual to the application of LT-OpPDR to the BSP (Defini- 
tion 5.1)—see Remark 3.24. 

We compare these two instances of LT-PDR with algorithms in the literature. 
If we impose |C;| = 1 on each element C; of Kleene sequences, the PDRF-K” 
instance of LT-PDR coincides with the conventional IC3/PDR [9,13]. In con- 
trast, PDR'®** coincides with Reverse PDR in [25,26]. The parallel execution 
of PDR?-** and PDR'™®-** roughly corresponds to fbPDR [25,26]. 


Structural Derivation. The equivalent problems (Propositions 5.2-5.3) are 
derived systematically from the categorical theory in Sect. 4.2. Indeed, using a 
lifting P: 25  2?% such that A > {A | A’ C A} (the must modality O), F’ in 
Definition 5.1 coincides with 5*P in (2). The above P preserves meets (cf. the 
modal axiom O(y A^ 4%) = Oy A Ov, see e.g. [7]); thus Proposition 4.2 derives 
the FSP. Finally, = in Proposition 5.3 allows the use of Proposition 4.3. More 
details are in [22, Appendix D]. 


5.2 LT-PDR for MDPs: PDR'™™PP 


The only known PDR-like algorithm for quantitative verification is PrIC3 [6] 
for Markov decision processes s(MDPs). Here we instantiate LT-PDR for MDPs 
and compare it with PrIC3. 

An MDP consists of a set S of states, a set Act of actions and a transition 
function ô mapping s € S and a € Act to either * (“the action a is unavailable 
at s”) or a probability distribution ô(s)(a) over S. 


Definition 5.4 (IBSP for MDPs). The inverse backward safety problem 
(IBSP) for an MDP (S,6), an initial state s, E€ S, a real number A € [0,1], 
and a set a C S of safe states, is the LFP-OA problem pgx. F'(£) <? d,.. 
Here d, à: S — [0,1] is the predicate such that d,y(s,) = à and d, A(s) = 1 
otherwise. F’: [0,1] — [0,1]% is defined by F’(d)(s) = 1 if s ¢ a, and 
F'(d)(s) = max{)> ves d(s’) - 6(s)(a)(s’) | a € Act, d(s)(a) A *} if s € a. 


The function F” in Definition 5.4 is a Bellman operator for MDPs—it takes the 
average of d over 6(s)(a) and takes the maximum over a. Therefore the lfp in 
Definition 5.4 is the maximum reachability probability to S\ a; the problem asks 
if it is < A. In other words, it asks whether the safety probability—of staying 
in a henceforth, under any choices of actions—is > 1 — A. This problem is the 
same as in [6]. 
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Instance of PDR. The IBSP (Definition 5.4) is LFP-OA and thus amenable to 
LT-PDR. We call this instance PDR? PP. See [22, Appendix E] for details. 

PDR!2™PP shares many essences with PrIC3 [6]. It uses the operator F’ 
in Definition 5.4, which coincides with the one in [6, Def. 2]. PrIC3 maintains 
frames; they coincide with KT sequences in PDR. 

Our Kleene sequences correspond to obligations in PrIC3, modulo the follow- 
ing difference. Kleene sequences aim at a negative witness (Sect. 3.2), but they 
happen to help the positive proof efforts too (Sect.3.3); obligations in PrIC3 
are solely for accelerating the positive proof efforts. Thus, if PrIC3 cannot solve 
these efforts, we need to check whether obligations yield a negative witness. 
Structural Derivation. One can derive the IBSP (Definition 5.4) from the 
categorical theory in Sect. 4.2. Specifically, we first formulate the BSP ad, <° 
vz. da \ 6*Ga, where G is a suitable lifting (of G for MDPs, Table 1) that com- 
bines average and minimum, ~: [0, 1]° — [0,1]° is defined by (-d)(s):=1—d(s), 
and da is such that da(s) = 1 if s € a and da(s) = 0 otherwise. Using 
~: [0,1] — [0,1]% in the above as an involution, we apply Proposition 4.3 
and obtain the IBSP (Definition 5.4). 

Another benefit of the categorical theory is that it can tell us a forward 
instance of LT-PDR (much like PDR?-** in Sect.5.1) is unlikely for MDPs. 
Indeed, we showed in Proposition 4.2 that G’s preservation of meets is essential 
(existence of a left adjoint is equivalent to meet preservation). We can easily 
show that our G for MDPs does not preserve meets. See (22, Appendix G]. 


5.3 LT-PDR for Markov Reward Models: PDRM®™“ 


We present a PDR-like algorithm for Markov reward models (MRMs), which 
seems to be new, as an instance of LT-PDR. An MRM consists of a set S of 
states and a transition function 6 that maps s € S (the current state) and c € N 
(the reward) to a function 6(s)(c) : S — [0,1]; the last represents the probability 
distribution of next states. 

We solve the following problem. We use [0,co]-valued predicates— 
representing accumulated rewards—where [0, oo] is the set of extended nonneg- 
ative reals. 


Definition 5.5 (SP for MRMs). The safety problem (SP) for an MRM (S,6) 
an initial state s, E€ S, A € [0,co], anda seta C S of safe states is ux. F’(x) < 
dià. Here dix: S — [0,00] maps s, to A and others to oo, and F": [0,00]* 
[0, 00]° is defined by F’(d)(s) = 0 if s ¢ a, and F'(d)(s) = X ses cen(C+ ds’): 
d(s)(c)(s’) ifs ea. 


The function F’ accumulates expected reward in a. Thus the problem asks 
if the expected accumulated reward, starting from s, and until leaving a, is < A. 


Instance of PDR. The SP (Definition 5.5) is LFP-OA thus amenable to LT- 
PDR. We call this instance PDR™ M | It seems new. See [22, Appendix F] for 
details. 
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Structural Derivation. The function F’ in Definition 5.5 can be expressed 
categorically as F(x) = da A 0*G(a), where da: S — [0,00] carries s € a to oo 
and s ¢ a to 0, and G is a suitable lifting that accumulates expected reward. 
However, the SP (Definition 5.5) is not an instance of the three general safety 
problems in Sect. 4.2. Consequently, we expect that other instances of LT-PDR 
than PDR™®™ (such as PDR*-** and PDR'™®** in Sect.5.1) are hard for 
MRMs. 


6 Implementation and Evaluation 


Implementation. LTPDR We implemented LT-PDR in Haskell. Exploiting 
Haskell’s language features, it is succinct (~50 lines) and almost a literal trans- 
lation of Algorithm 3 to Haskell. Its main part is presented in [22, Appendix K]. 
In particular, using suitable type classes, the code is as abstract and generic as 
Algorithm 3. 

Specifically, our implementation is a Haskell module named LTPpR. It has 
two interfaces, namely the type class CLat r (the lattice of predicates) and the 
type Heuristics r (the definitions of Candidate, Decide, and Conflict). The 
main function for LT-PDR is 1tPDR :: CLat 7 > Heuristics T > (T > T) >T 
I0 (PDRAnswer 7) , where the second argument is for a monotone function F of 
type 7 +7 and the last is for the safety predicate a. 

Obtaining concrete instances is easy by fixing T and Heuristics r . A simple 
implementation of PDR*** takes 15 lines; a more serious SAT-based one for 
PDRE" takes ~130 lines; PDRĪP-MDP and PDR™®™ take ~80 lines each. 


Heuristics. We briefly discuss the heuristics, i.e. how to choose x € L in 
Candidate, Decide, and Conflict, used in our experiments. The heuristics of 
PDR*-** is based on the conventional PDR [9]. The heuristics of PDR'2- MPP 
is based on the idea of representing the smallest possible x greater than some 
real number v € [0, 1] (e.g. z taken in Candidate) as x = v+e, where € is a sym- 
bolic variable. This implies that Unfold (or Valid, Model) is always applied 
in finite steps, which further guarantees finite-step termination for invalid cases 
and w-step termination for valid cases (see [22, Appendix H] for more detail). 
The heuristics of PDRM®™ js similar to that of PDR'®™P?. 


Experiment Setting. We experimentally assessed the performance of instances 
of LTPDR. The settings are as follows: 1.2 GHz Quad-Core Intel Core i7 with 10 GB 
memory using Docker, for PDRIB™PP. Apple M1 Chip with 16GB memory 
for the other. The different setting is because we needed Docker to run PrIC3 [6]. 


Experiments with PDR™®™. Table 2a shows the results. We observe that 
PDR™®™ answered correctly, and that the execution time is reasonable. Fur- 
ther performance analysis (e.g. comparison with [19]) and improvement is future 
work; the point here, nevertheless, is the fact that we obtained a reasonable 
MRM model checker by adding ~80 lines to the generic solver LTPDR. 
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Experiments with PDRI®™PP | Table2c shows the results. Both PrIC3 
and our PDR'®™PP solve a linear programming (LP) problem in Decide. 
PrIC3 uses Z3 for this; PDR'™®-MPP uses GLPK. PrIC3 represents an MDP 
symbolically, while PDR!8-™PP do so concretely. Symbolic representation in 
PDRIB MDP is possible—it is future work. PrIC3 can use four different inter- 
polation generalization methods, leading to different performance (Table 2c). 

We observe that PDR'®-MPP outperforms PrIC3 for some benchmarks 
with smaller state spaces. We believe that the failure of PDR'™®-™PP in many 
instances can be attributed to our current choice of a generalization method (it is 
the closest to the linear one for PrIC3). Table 2c suggests that use of polynomial 
or hybrid can enhance the performance. 


Experiments with PDR‘ *®*. Table 2b shows the results. The benchmarks 
are mostly from the HWMCC’15 competition [1], except for latchO.smv! and 
counter.smv (our own). 

IC3ref vastly outperforms PD in many instances. This is hardly a 
surprise—IC3ref was developed towards superior performance, while PDRF-**’s 
emphasis is on its theoretical simplicity and genericity. We nevertheless see that 
PDR*-** solves some benchmarks of substantial size, such as power2bit8.smv. 
This demonstrates the practical potential of LT-PDR, especially in view of the 
following improvement opportunities (we will pursue them as future work): 1) 
use of well-developed SAT solvers (we currently use toysolver? for its good 
interface but we could use Z3); 2) allowing |C;| > 1, a technique discussed in 
Sect.5.1 and implemented in IC3ref but not in PDR®-*"; and 3) other small 
improvements, e.g. in our CNF-based handling of propositional formulas. 


RF-Kr 


Ablation Study. To assess the value of the key concept of PDR (namely the 
positive-negative interplay between the Knaster—Tarski and Kleene theorems 
(Sect. 3.3)), we compared PDR*-** with the instances of positive and nega- 
tive LT-PDR (Sects. 3.1-3.2) for Kripke structures. 

Table 2d shows the results. Note that the value of the positive-negative inter- 
play is already theoretically established; see e.g. Proposition 3.17 (the interplay 
detects executions that lead to nowhere). This value was also experimentally wit- 
nessed: see power2bit8.smv and simpleTrans.smv, where the one-sided meth- 
ods made wrong choices and timed out. One-sided methods can be efficient 
when they get lucky (e.g. in counter.smv). LT-PDR may be slower because of 
the overhead of running two sides, but that is a trade-off for the increased chance 
of termination. 


Discussion. We observe that all of the studied instances exhibited at least 
reasonable performance. We note again that detailed performance analysis and 
improvement is out of our current scope. Being able to derive these model check- 
ers, with such a small effort as ~100 lines of Haskell code each, demonstrates 
the value of our abstract theory and its generic Haskell implementation LTPDR. 


1 https: //github.com/arminbiere/aiger. 
? https: //github.com/msakai /toysolver. 
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Table 2. experimental results for our PDRF-*", PDR!®-™PP and PDRMPM 


(a) Results with PDR’®™. The MRM (b) Results with PDR*-** in comparison 
is from [4, Example 10.72], whose ground with IC3ref, a reference implementation 


truth expected reward is 4. The of [9] (https://github.com/arbrad/ 
benchmarks ask if the expected reward IC3ref). Both solvers answered correctly. 
(not known to the solver) is < 1.5 or Timeout (TO) is 600 sec. 
< 1.3. 
Benchmark Result Time Benchmark {S| Result PDRF-Kt [C3ref 
DisByCon= +5 True 6.01 ms latch0.smv 2° True 317 ps 270 ps 
DieByCom= 1° False 43.1 ps countersmv 2° False 1.620s 3.27ms 


power2bit8.smv 2'5 True 1.516s 4.13ms 
ndista128.smv 2'7 True TO 73.1 ms 
shiftladd256.smv 2?! True TO 174ms 


(c) Results with PDR'®-™PP (an excerpt of [22, Table 3]). Comparison is against PrIC3 [6] with 
four different interpolation generalization methods (none, linear, polynomial, hybrid). The 
benchmarks are from [6]. |S| is the number of states of the benchmark MDP. “GT pr.” is for the 
ground truth probability, that is the reachability probability Pr™*(s, = o(S \ a)) computed 
outside the solvers under experiments. The solvers were asked whether the GT pr. (which they do 
not know) is < A or not; they all answered correctly. The last five columns show the average 
execution time in seconds. — is for “did not finish,” for out of memory or timeout (600 sec.) 


Benchmark |S| GT pr. A PDRI®-MDP PrIC3 
none lin. pol. hyb. 
Gii 10 12g 83 0.31 1.31 19.34 - = 

sul + 

0.2 0.48 1.75 24.62 - = 
Grid 10° a4p-0 08 sees p p = 7 
0.2 136.46 - - - - 
0.1 = = = = z 
BRP 10 0.035 0.01 18.52 56.55 594.89 — 722.38 
0.005 1.36 11.68 238.09 — - 
0.9 = = - 058 0.51 
0.75 = £ - 055 046 
ZeroConf 104 0.5 y ? pee 
0.52 = = - 048 046 
0.45 <0.1 <0.1 <01 <01 <0.1 
0.9 = 72.37 =- 0.91 0.70 
Chain 10° 0.394 04 Pee 993 D 
0.35 177.12 115.98 - = = 
0.3 88.27 66.89 557.68 — - 
0.9 2 = = 1.83 1.99 
0.3 = = = 1.88 1.96 
DoubleChain 10° 0.215 a 
0.216 2 = — 139.76 -= 
0.15 7.46 = = = = 


(d) Ablation experiments: LT-PDR (PDR*-**) vs. positive and negative LT-PDRs, implemented 
for the FSP for Kripke structures. The benchmarks are as in Table 2b, except for a new micro 
benchmark simpleTrans.smv. Timeout (TO) is 600 sec. 


Benchmark Result LT-PDR positive negative 


latch0O.smv True 317ps  1.68ms TO 
power2bit8.smv True  1.516s TO TO 

counter.smv False 1.620s TO 2.88 ps 
simpleTrans.smv False 295 ps TO TO 
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7 Conclusions and Future Work 


We have presented a lattice-theoretic generalization of the PDR algorithm called 
LT-PDR. This involves the decomposition of the PDR algorithm into positive 
and negative ones, which are tightly connected to the Knaster—Tarski and Kleene 
fixed point theorems, respectively. We then combined it with the coalgebraic and 
fibrational theory for modeling transition systems with predicates. We instanti- 
ated it with several transition systems, deriving existing PDR algorithms as well 
as a new one over Markov reward models. We leave instantiating our LT-PDR 
and categorical safety problems to derive other PDR-like algorithms, such as 
PDR for hybrid systems [29], for future work. 

We will also work on the combination of our work and the theory of abstract 
interpretation [10,12]. Our current framework axiomatizes what is needed of 
heuristics, but it does not tell how to realize such heuristics (that differ a lot in 
different concrete settings). We expect abstract interpretation to provide some 
general recipes for realizing such heuristics. 
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Abstract. Loop invariant generation, which automates the generation 
of assertions that always hold at the entry of a while loop, has many 
important applications in program analysis and formal verification. In 
this work, we target an important category of while loops, namely affine 
while loops, that are unnested while loops with affine loop guards and 
variable updates. Such a class of loops widely exists in many programs 
yet still lacks a general but efficient approach to invariant generation. We 
propose a novel matrix-algebra approach to automatically synthesizing 
affine inductive invariants in the form of an affine inequality. The main 
novelty of our approach is that (i) the approach is general in the sense 
that it theoretically addresses all the cases of affine invariant generation 
over an affine while loop, and (ii) it can be efficiently automated through 
matrix-algebra (such as eigenvalue, matrix inverse) methods. 

The details of our approach are as follows. First, for the case where 
the loop guard is a tautology (i.e., ‘true’), we show that the eigenvalues 
and their eigenvectors of the matrices derived from the variable updates 
of the loop body encompass all meaningful affine inductive invariants. 
Second, for the more general case where the loop guard is a conjunction 
of affine inequalities, our approach completely addresses the invariant- 
generation problem by first establishing through matrix inverse the rela- 
tionship between the invariants and a key parameter in the application 
of Farkas’ lemma, then solving the feasible domain of the key parameter 
from the inductive conditions, and finally illustrating that a finite num- 
ber of values suffices for the key parameter w.r.t a tightness condition 
for the invariants to be generated. 

Experimental results show that compared with previous approaches, 
our approach generates much more accurate affine inductive invariants 
over affine while loops from existing and new benchmarks within a few 
seconds, demonstrating the generality and efficiency of our approach. 


1 Introduction 


An invariant is a logical assertion at a certain program location that always holds 
whenever the program executes across that location. Invariants are indispens- 
able parts of program analysis and formal verification, and thus the generation of 
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invariants has been key to the proof and analysis of crucial properties like reach- 
ability [3,6,15], time complexity [9] and safety [2,32]. To ease program analysis 
and formal verification, there has been a long thread of research on approaches 
to automatic generation of invariants, including constraint solving [10,12,27], 
recurrence analysis [17,24,29,31], abstract interpretation [13,14], logical infer- 
ence [18,19,38], dynamic analysis [33,39], and machine learning [20, 23,44]. To 
guarantee that an assertion is indeed an invariant, the widely-adopted paradigm 
is to generate an inductive invariant that holds for the first execution and 
for every periodic execution to the particular program location [12,32]. In this 
work, we consider an important subclass of invariants called numerical invariants 
which are assertions over the numerical values taken by the program variables, 
and are closely related to many common vulnerabilities like integer overflow, 
buffer overflow, division by zero and array out-of-bound. More specifically, we 
consider affine inductive invariants in the form of an affine inequality over pro- 
gram variables, and focus on affine while loops that have affine loop guards (as 
a conjunction of affine inequalities) and affine updates for the program variables 
but do not have nested loops. 

To automate the generation of affine inductive invariants, we adopt the 
constraint-soluing based approach with three steps. First, it establishes a tem- 
plate with unknown parameters for the target invariants. Second, it collects 
constraints derived from the inductive conditions. Finally, it solves the unknown 
parameters to get the desired invariants. Prior work in this space [12,37] lever- 
ages Farkas’ lemma to provide a sound and complete characterization for the 
inductive conditions and then generates the affine inductive invariants either by 
the complete approach of quantifier elimination [12] or through several heuris- 
tics [37]. Specifically, the StInG invariant generator [40] implements the approach 
in [37], and the InvGen invariant generator [22] integrates abstract interpreta- 
tion as well as the approach in [37]. Furthermore, a recent effort [34] leverages 
eigenvalues and eigenvectors for inferring a restricted class of invariants. Finally, 
some recent work considers decidable logic fragments that directly verify prop- 
erties of loops [4,11,28,30]. Compared with other approaches such as machine 
learning and dynamic analysis, constraint solving has a theoretical guarantee on 
the correctness and accuracy of the generated invariants, yet typically at the 
cost of higher runtime complexity. 

The novelty of our approach lies in that it completely addresses the con- 
straints derived from Farkas’ lemma by matrix methods, thus ensuring both 
generality and efficiency. In detail, this paper makes the following contributions 
(due to the page limit, the current paper is abridged. The full version is available 
at [25]): 


— For affine while loops with tautological guard, we prove that the affine induc- 
tive invariants are determined by the eigenvalues and eigenvectors of the 
matrices that describe variable updates in the loop body. 

— For affine while loops whose loop guard is a conjunction of affine inequali- 
ties, we solve the affine inductive invariants by first deriving through matrix 
inverse a formula with a key parameter in the application of Farkas’ lemma, 
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then solving the feasible domain of the key parameter from the inductive con- 
ditions, and finally showing that it suffices to choose a finite number of values 
for the key parameter if one imposes a tightness condition on the invariants. 

— We generalize our results to affine while loops with non-deterministic updates 
and to bidirectional affine invariants. A continuity property on the invari- 
ants w.r.t. the key parameter is also proved for tackling the numerical issue 
arising from the computation of eigenvectors. Experimental results on exist- 
ing benchmarks and new benchmarks arising from linear dynamical systems 
demonstrate the generality and efficiency of our approach. 


1.1 Related Work 


Constraint Solving. There have been several prior approaches [12,37] using 
constraint solving for invariant generation based on Farkas’ lemma. Compared 
to the approach in [12] that uses quantifier elimination to solve the constraints 
from Farkas’ lemma, our approach is more efficient since it only involves the 
matrix computation. Compared with [37] that uses several heuristics, our app- 
roach is more general and complete in addressing all the cases in affine invariant 
generation. While the approach in [34] also uses eigenvectors, it is restricted to 
the subclass of equality and convergent invariants. In contrast, our approach 
targets at general affine inductive invariants over affine while loops. Other prior 
work [4,11,28,30] considers to have a decidable logic for unnested affine while 
loops with tautological guard but no conditional branches. Compared with them, 
our approach handles general affine while loops and targets at invariant genera- 
tion. 


Abstract Interpretation. A long thread of research to infer inductive invari- 
ants is using abstract interpretation [1,7,22,35] framework which constructs 
sound approximations for program semantics. In a nutshell, it first establishes 
an abstract domain for the specific form of properties to be generated, and then 
performs fixed-point computation in the abstract domain. Abstract interpreta- 
tion generates invariants whose precision depends on the abstract domain and 
abstract operators, except for rare special cases [21,37]. 


Recurrence Analysis. Another closely-related technique is recurrence anal- 
ysis [8,17,24,29,31]. The main idea is transforming the problem of invariant 
generation into a recurrence relation problem and then solve the latter one. The 
main limitation of recurrence analysis is that it requires the underlying recur- 
rence relation to have a closed-form solution. This requirement, unfortunately, 
does not hold for the general case of affine inductive invariants over affine while 
loops. 


Logical Inference. Invariants could also be obtained through logical inference, 
such as abductive inference [16], Craig interpolation [18], ICE learning [19,43], 
random search [38], etc. These approaches, however, cannot provide any theoret- 
ical guarantee on the accuracy of the generated numerical invariants. In contrast, 
our approach essentially addresses this issue. 
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Dynamic Analysis. Dynamic analysis [33,39] has also been exploited to invari- 
ant generation. The major process is first to collect the execution traces of a 
particular program by running it multiple times, and then guess the invariants 
based on these traces. As indicated in its process, dynamic analysis provides no 
guarantee on the correctness or accuracy of the inferred invariants, yet still pays 
the price of running the program at a large amount of time. 


Machine Learning. There is a recent trend of applying machine learn- 
ing [20,23,44] to solve the invariant-generation problem. Such approaches first 
establish a (typically large) training set of data, then use training approaches 
such as neural networks to generate invariants. Compared to our approach, those 
approaches require a large training set, while still having no theoretical guaran- 
tee on the correctness or accuracy. Specifically, such approaches cannot produce 
specific numerical values (e.g., eigenvalues) that are required to handle some 
examples in this work. 


2 Preliminaries 


In this section, we specify the class of affine while loops and define the affine- 
invariant-generation problem over such loops. Throughout the paper, we use 
V = {£1,..., Zn} to denote the set of program variables in an affine while loop; 
we abuse the notation V so that it also represents the current values (before the 
execution of the loop body) of the original variables in V, and use the primed 
variables V’ := {x' | x € V} for the next values (after the execution of the 
loop body). Furthermore, we denote by x = [21,...,%,]" the vector variable that 
represents the current values of the program variables, and by x’ = [z/,...,2/,]7 
the vector variable for the next values. 

An affine while loop is a while loop without nested loops that has affine 
updates in each assignment statement and possibly multiple conditional branches 
in the loop body. To formally specify the syntax of it, we first define affine 
inequalities and assertions, program states and satisfaction relation between 
them as follows. 


Affine Inequalities and Assertions. An affine inequality ¢ is an inequality 
of the form cT - y +d < 0 where c is a real vector, y is a vector of real-valued 
variables and d is a real scalar. An affine assertion is a finite conjunction of affine 
inequalities. An affine assertion is satisfiable if it is true under some assignment 
of real values to its variables. Given an affine assertion 7 over vector variable 
x, we denote by y’ the affine assertion obtained by substituting x in 7 with its 
next-value variable x’. 


Program States. A program state v is a real vector v = [vj,...,Un]* such that 
each v; is a concrete value for the variable x; (in the vector variable x). We say 
that a program state v satisfies an affine inequality ¢ = c'-x+d < 0, written as 
v E 4, if it holds that cT -v +d < 0. Likewise, v satisfies an affine assertion 7 if 
it satisfies every conjunctive affine inequality in w. Furthermore, given an affine 
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assertion ~ with both x and x’, we say that two program states v,v’ satisfy wv, 
written as v,v’ = Y, if q is true when one substitutes x by v and x’ by v’. 
We then illustrate the syntax of (unnested) affine while loops as follows. 


Affine While Loops. We consider affine while loops that take the form: 


initial condition 6:R-x+f<0O 
while G:P-x+q<0 do 
case y :T,-x—T)-x’+b, <0 (7); 


case w,:T,-x—Ti-x’ +b, <0 (Tk); 
end 


where (i) @ is an affine assertion that specifies the initial condition for inputs and 
is given by the real matrix R and vector f, (ii) G is an affine assertion serving 
as the loop guard given by the real matrix P and vector q, and (iii) each %; 
is an affine assertion that represents a conditional branch, with the relationship 
between the current-state vector x and the next-state vector x’ given by the 
affine assertion 7; := T} -x — T} -x' +b; < 0 with transition matrices T}, T} 
and vector bj. In this work, we always assume that the rows of R. are linearly 
independent (this condition means that every variable x; has one independent 
initial condition attached to it, which holds in most situations such as a fixed 
initial program state), such that RT is left invertible; we denote its left inverse 
as (R™)r'. 

The execution of an affine while loop is as follows. First, the loop starts with 
an arbitrary initial program state v* that satisfies the initial condition 6. Then in 
each loop iteration, the current program state v is checked against the loop guard 
G. In the case that v = G, the loop arbitrarily chooses a conditional branch Y; 
satisfying v = w,;, and sets the next program state v’ non-deterministically such 
that v, v” = Tj; the next program state v’ is then set as the current program 
state. Otherwise (i.e., v jÆ G), the loop halts immediately. 

Now we define affine inductive invariants over affine while loops. Informally, 
an affine inductive invariant is an affine inequality satisfying the initiation and 
consecution conditions which mean that the inequality holds at the start of 
the loop (initiation) and is preserved under every iteration of the loop body 
(consecution). 


Affine Inductive Invariants. An affine inductive invariant for an affine while 
loop (f) is an affine inequality ® that satisfies the initiation and consecution 
conditions as follows: 


— (Initiation) 6 implies ®, i.e., v = 0 implies v = @ for all program states v; 
— (Consecution) for all program states v,v’ and every w;,7; (1 < j < k) in 
(t), we have that (v EEF GAvVE@Av,v Er) >s v EË. 


From the definition above, it can be observed that an affine inductive invariant is 
an invariant, in the sense that every program state traversed (as a current state 


262 Y. Ji et al. 


at the start or after every loop iteration) in some execution of the underlying 
affine while loop will satisfy the affine inductive invariant. 

From now on, we abbreviate affine while loops as affine loops and affine 
inductive invariants as affine invariants. 


Problem Statement. In this work, we study the problem of automatically gen- 
erating affine invariants over affine loops. Our aim is to have a complete math- 
ematical characterization on all such invariants and develop efficient algorithms 
for generating these invariants. 


3 Affine Invariants via Farkas’ Lemma 


Affine invariant generation through Farkas’ lemma is originally proposed in [12, 
37]. Farkas’ lemma is a fundamental result in the theory of linear inequalities that 
leads to a complete characterization for the affine invariants. Since our approach 
is based on Farkas’ lemma, we present a detailed account on the approaches 
in [12,37], and point out the weakness of each of the approaches. 


Theorem 1 (Farkas’ Lemma). Consider the following affine assertion S over 
real-valued variables y1, ..., Yn: 


a1Y1 +. + Ginyn + b1 <0 


an1Y1 +. + AknYn + bp < 0 
when S is satisfiable, it entails a given affine inequality 
Q: ayı +--+ Cn¥n td <0 


if and only if there exist non-negative real numbers Xo, ..., Ax such that (i) 
ĉj = ys Aiaiy for 1 <j <n and (i) d= oo didi) — Ao- 


The application of Farkas’ lemma can be visualized by a table form as follows: 


Xo —1 <0 
Art| ayı + F @1nYn +b <0 
(S) (4) 
Ak} @k1Y1 +. + aknYn +bk < 0 
| Ci1yi T- + CnYn +d < (0) (ġ) 


The intuition of the table form above is that one first multiplies the A;’s on the 
left to their corresponding affine inequalities (in the same row) on the right, and 
then sums these affine inequalities together to obtain the affine inequality at the 
bottom. In this paper, we will call the table form as Farkas table. 

Given an affine loop as (t), the approaches in [12,37] first establish a template 
® : ca, +... + Cnt, +d < 0 for an affine invariant where c),...,c,,d are 
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the unknown coefficients. Second, they establish constraints for the unknown 
coefficients from the initiation and consecution conditions for an affine invariant, 
as follows. 


Initiation. By Farkas’ lemma, the initiation condition can be solved from the 
Farkas table ({) with S := 0 and ¢:= ®: 


A —1<0 
A|R-x +f <0 (6) (#) 
cT.x+d<0 (8) 


Here we rephrase the affine inequalities in 0 and ® with the condensed matrix 
forms R-x +f < 0 and cT -x +d < 0; we also use A = [Aj,..., Ax]? to denote 
the non-negative parameters in the leftmost column of (f). 


Consecution. The consecution condition can be solved by handling each condi- 
tional branch (specified by 7j, Y; in ({)) separately. By Farkas’ lemma, we treat 
each conditional branch by the Farkas table (t) with S := BAGA Tj and ¢:= P: 


u|cT.x + d<0 (®) 
AG — 1<0 
EJP-x + q<0 (G) (x) 


Note that the Farkas table above contains quadratic constraints as we multiply 
an unknown non-negative parameter pu to the unknown invariant cT -x +d <0 
in the table. The Farkas tables for all conditional branches are grouped conjunc- 
tively together to represent the whole consecution condition. 

The weakness of the approaches presented in [12,37] lies at the treatment of 
the quadratic constraints from the consecution condition. The approach in [12] 
addresses the quadratic constraints by quantifier elimination that guarantees 
the theoretical completeness but typically has high runtime complexity. The 
approach in [37] solves the quadratic constraints by several heuristics that guess 
possible values for the key parameter p in (x) which causes non-linearity, hence 
losing completeness. Our approach considers to address parameter u through 
matrix-based methods (eigenvalues and eigenvectors, matrix inverse, etc.), which 
is capable of efficiently generating affine invariants (as compared with quantifier 
elimination in [12]) while still ensuring theoretical completeness (as compared 
with the heuristics in [37]). 


4 Single-Branch Affine Loops with Deterministic Updates 


For the sake of simplicity, we first consider the affine invariant generation for a 
simple class of affine loops where there are no conditional branches in the loop 
body and the updates of the next-value vector x’ are deterministic. 
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Formally, an affine loop with deterministic updates and a single branch takes 
the following form: 


initial condition 6:R-x+f<0O 
while G do x'=T-x+b; end 


For the loop above, we aim at non-trivial affine invariants, i.e., affine invariants 


cT.x+d <0 with c 4 0. We summarize our results below. 


1. When the loop guard is ‘true’, there are only finitely many independent non- 
trivial invariants cT -x + d < 0 where c is an eigenvector of the transpose of 
the transition matrix T. 

2. When the loop guard is not a tautology, there can be infinitely many more 
non-trivial invariants cT -x+ d < 0 with c given by a direct formula in p; in 
this case we derive the feasible domain of u and select finitely many optimal 
ones (which we call tight choices) among them. 


In Sect. 4.1, we first derive the constraints from the initiation (#4) and conse- 
cution (x) conditions satisfied by the invariants. Then we solve these constraints 
for the tautological loop guard case in Sect.4.2 and the single-constraint loop 
guard case in Sect. 4.3. Finally we generalize the results to the multi-constraint 
loop guard case in Sect. 4.4. 


4.1 Derived Constraints from the Farkas Tables 
We first derive the constraints from the Farkas tables as follows: 


Initiation. Recall the Farkas table (#) for initiation. We first compare the 
coefficients of x above and below the horizontal line in (#), and obtain 


AT-R=c? => R?-AH=c. (1) 
Then by comparing the constant terms in (#4), we have: 
-M+Al.f=d > f7-A-d=N>0. (2) 


Note that RT has left inverse (RT), thus constraint (1) is equivalent to A = 
(R™),|-c. Plugging it into (2) yields 


fT. (RT) '-c-d=A\ >20. (3) 


Consecution. The Farkas table (x) for consecution in the case of single-branch 
affine loops with deterministic updates is as follows: 


ul|ct-x +d<0 (8) 

AG —1<0 

E | Pex +q<0 (G) 

n\|T-x — x’ +b=0(r) 
| cT.x +d<0 (8) 
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Here the transition matrix T is a n x n square matrix, and b is a n-dimensional 
vector. Since 7 contains only equalities, the components 7),...,7n of the vector 
parameter 7 do not have to be non-negative (while the components &1, ..., En of 
£ and u must be non-negative). In this table, by comparing the coefficients of x’ 
above and below the horizontal line, we easily get —7 = c. Then we substitute 
7 by —c and compare the coefficients of x above and below the horizontal line. 
We get 


peel +é?-.P—c?.T=0' = p-c—T!-c+P'-€=0. (4) 
We also compare the constant terms and get 
ped—dAV+ET-q—c?-b=d > (wp—1)d—b?-c+q?-E=AE >0. (5) 
The rest of this section is devoted to solving the invariants ® : cl.x+d<0 
which satisfy all constraints (1)—(5). 
4.2 Loops with Tautological Guard 


We first consider the simplest case where the loop guard is ‘true’: 


initial condition 6:R-x+f<0O (0) 
while true do x’ =T-x+b; end 


In order for completely solving the non-linear constraints, we take three steps: 


1. choose the correct u, thus turn the non-linear constraints into linear ones; 

use linear algebra method to solve out the vector c; 

3. with u and c known, find out the feasible domain of d and determine the 
optimal value of it. Here ‘optimality’ is defined by the fact that all invariants 
with other d’s in this domain are implied by the invariant with the ‘optimal’ 
d. 


D 


Step 1 and Step 2. We address the values of u, c by eigenvalues and eigenvec- 
tors in the following proposition: 


Proposition 1. For any non-trivial invariant cT -x +d < 0 of the loop (©), we 
have that c must be an eigenvector of TT with a non-negative eigenvalue u. 


Proof. Since the loop guard is a tautology, we take the parameter € to be O in 


(4): 
u-c-TT.c=0. 


It’s obvious that u must be a non-negative eigenvalue of TT and c is the corre- 
sponding eigenvector. 
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Example 1. (Fibonacci numbers). Consider the sequence {sn} defined by initial 
condition sı = s2 = 1 and recursive formula Sn+2 = Sn+1 + Sn for n > 1. If we 
use variables (£1, £2) to represent (Sn, Sn+1), then the sequence can be written 
as a loop: 


initial condition 0 : R -x +f = b i l 2 F E = 


01 T2 
z zi XY 01 £i 
while true do |,| =T. +b= . +0; end 
Tə $9 1 1 T2 


The eigenvalues of matrix TT are es, ne only the second one is non- 


negative. This eigenvalue p = H3 yields eigenvector ¢ = [c1, 8c], here 


cı is a free variable, which could be fixed in the final form of the invariant. 


Step 3. After solving u and c, we illustrate the feasible domain of d and its 
optimal value by the following proposition: 


Proposition 2. For any u and c given by Proposition 1, the feasible domain of 
d is an interval determined by the two conditions below: 


d<f'-(R")r'-c and (u—l1)d>b'-c. 


If the above conditions have empty solution set, then no affine invariant is avail- 
able from such u and c; otherwise, the optimal value of d falls in one of the two 
choices: 

d=f'.(R");'-c or (uw—l1)d=b! -c. 


Proof. Constraint (3) provides one condition for d: 
fT. (RT) -c-d=\9 20 = fT. (RD)! -c>d; 
while constraint (5) with € = 0 provides the other condition: 
(u—-1)d-bT-.c =A >0 > (u-1)d> b.c. 


To obtain the strongest inequality cT -x +d < 0, we need to take d to be either 
minimal or maximal value, i.e., some boundary point of its interval; thus the 
invariant with this d would imply all invariants with the same c and other d’s in 
this interval. The boundary is achieved when one of the two conditions achieves 
the equality. 


Example 2 (Fibonacci, Part 2). We continue with Example 1. Recall that u = 


5, c= [c, 1+5 6, JT; in this case, constraints (3) (5) (with € = 0) read 


— 345 ¢, > d and =4+v5q > 0, hence yield 0 < d < -45e The free variable 
T 


cı must be negative here, so we choose cı = —2 and thus c = [—2, —1 — V5] 
and0<d<3+ V5; there are two boundary values d = 0 and d = 3+ V5, where 
d= 3 + v5 leads to the strongest invariant: 


u= (1+ V5)/2: —2x1 — (1 + V5)z2 +3 + V5 < 0. 
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4.3 Loops with Guard: Single-Constraint Case 


Here we study the loops with non-tautological guard. First of all, the eigenvalue 
method of Sect. 4.2 applies to this case as well; thus for the rest of Sect.4, we 
always assume that u is not any eigenvalue of T (and c is not any eigenvector 
of TT either) and aim for other invariants than the ones from the eigenvectors. 

Let us start with the case that the loop guard consists of only one affine 
inequality: 


initial condition 6:R-x+f<0O (o!) 
while p?-x+q<0 do x! =T.x+b; end 


where p is a n-dimensional real vector and q is a real number. 
We again take three steps to compute the invariants; these steps are different 
from the previous case: 


1. we derive a formula to compute c in terms of u; so for any non-negative real 
value u, we get a corresponding c; 

2. however, not all ws would produce invariants that satisfy all constraints (1)- 
(5). We will determine the feasible domain of u that does so; 

3. we will select finitely many p’s from its feasible domain which provide tight 
invariants; the meaning of tightness will be defined later. For every single p, 
we will also determine the feasible domain of d and optimal value of it. 


Step 1. We first establish the relationship between u and c through the con- 
straints. The initiation is still (1) (2) (3), while the consecution (4) (5) becomes: 
u-c—T?-ct+é-p=0 (4’) 

(w-1)d-bT-c +E-q=AF D0 (5) 


where the matrix P in (4) degenerates to vector pT and the vectors q, € in (5) 
both have just one component q, € here. Note that £ is a non-negative parameter. 

In contrast to Sect.4.2, we assume that u is not any eigenvalue of T, and 
€ #0. For such u, we have a new formula to compute c: 


Proposition 3. For any non-trivial invariant ct -x+d < 0 of the loop (0'), we 
have that c is given by 


c=€:(T’-p-I)*-p with €>0 (6) 


when u is fixed, c’s with different €’s are proportional to each other and yield 
equivalent invariants. 


Proof. Since u is not any eigenvalue of T, the matrix u+ I — TT is invertible; 
thus (4’) is equivalent to 


(u-I-T")-c=-€-p > c=¢ (T7 -p-I)'-p. 
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Example 3 (Fibonacci, Part 3). We add a loop guard x; < 10 to Example 1: 


initial condition 0 : R- x +f = k | ia P Ei T 


while pT -x +q = [1,0]. |” 


/ 
i =T. d +b= i | . kl +0; end 
T3 T2 11 T2 
and search for more invariants. The formula (6) here reads 


al E L= j=l}. |1| _ £ l-u 
o) -=p-1|=1 =a) [0] -u-i | =] 


Step 2. With formula (6) in hand, every non-negative value u would give us a 
vector c; the next step is to find such p’s that (1) (2) (3) (5’) are all satisfied. 
We call this set the feasible domain of n. 

Notice that (3) and (5’) are two inequalities both containing d. When the 
value of u changes, there is a possibility that (3) and (5’) conflict each other, 
hence make no invariant available. So the feasible domain consists of such ws 
that make the two inequalities compatible with each other: 


Proposition 4. For the loop (o'), any feasible u falls in (0,1) U (K N [1,+00)), 
where K is the solution set to the following rational inequality of u (which we 
call ‘compatibility condition’): 


bt. (T? —p- TD -p—q<(u—-Df?- (RDE TT- wD? -p. (7) 
Proof. We multiply (u — 1) on both sides of (3) and get 


(u—1)f"-(R*)r!-e<(w—1)d when 0<p<1 (3’) 
(u—1f*-(R*)p*-e>(~—1)d when p>1 (3”) 


compare them with (5’), we see: (3’) (5’) would not conflict each other because 
they are both about (u — 1)d being ‘larger’ than something. However, (3”) (5’) 
are two inequalities of opposite directions, they together must satisfy 


bt -c—£-q<(u—1)d< (u—1)f"- (R*)p? -c 


to be compatible. Substitute c by (6) in the above inequality and cancel out 
€ > 0, we obtain the desired inequality: 


bt: (TT -pI p-q< (u—1)f? (RT) (TT — Dp. 


Every u from [0, 1) and KA[1, +00) would lead to non-trivial invariant satisfying 
all constraints (1) (2) (3) (4’) (5’). 
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Example 4 (Fibonacci, Part 4). Let us compute the feasible domain of u for 
Example 3. Inequality (5’) is (u — 1)d > 10€; inequality (3”) is 


(u= 1)[-1,—1]- b i ee SH De 


= SE Z (ud (when p> 1), 


We combine them to form the compatibility condition (7) as 


(u— Dy ee 9(u — 3)(u + 3) 


Pope — (u-u 44) 


10 < (when ps > 1). 


The solution domain of it is (5, 5]. Thus by Proposition 4, the feasible domain 
of uis [0,1) U (44%, 5). 


Step 3. Proposition 4 provides us with a continuum of candidates for u, thus 
produces infinitely many legitimate invariants. We want to find a basis consisting 
of finitely many invariants, such that all invariants are non-negative linear com- 
binations of the basis; however, this idea does not work out, where the reason 
is explained thoroughly in the full version of this paper [25, Appendix A.1 and 
A.2]. Instead, we impose a weaker form of optimality called tightness coming 
from the equality cases of constraints (3) (5’): 


fT. (RD) -c-d=A)=0 
(u—1)d-b".-c+£-q=)5=0 
we call an invariant tight and the corresponding p as tight choice when both 
equalities are achieved: 


— AJ = 0: The invariant is tight at the initial state, i.e., the invariant reaches 
equality at the initial state; 
— AÇ = 0: The invariant stays as close to being tight as much at later iterations. 


The non-tight choices could be kept as back-up for invariant generation. The 
tight choices are characterized by the following proposition: 


Proposition 5. For the loop (0'), the tight choices of u consist of O and the 
positive real roots of the following rational equation: 


bt. (T? —p- TD -p—q=(u-Df?- (RT) (TT -—p- Dp. (8) 
Note that these roots are also the boundary points of the intervals in K defined 
in Proposition 4. 


Proof. Recall Proposition 2, constraints (3) (5) form the two boundaries of the 
domain of d, which can not be achieved simultaneously in the case of loops with 
tautological guard. Nevertheless, in the case of loops with guard, we have an 
extra freedom on u which allows us to set Ah = Af = 0: 


fT. (RI) -c=d A (u—1)d=b"-c—é-q 
> b”. (TT-p:I)t-p-q= (4-1)f". (RDE (TT -pD p. 
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Equation (8) is just the case that (7) achieves the equality, hence is a rational 
equation of u with finite number of roots. These roots are also the boundary 
points of K since K is the solution domain to (7). Besides the roots of (8), 
H = 0 is also a boundary point of the feasible domain; its corresponding invariant 
reflects the feature of the loop guard itself. Thus we add it into the list of tight 
choices. 


With u determined and c fixed up to a scaling factor, the last thing remains 
is to determine the optimal d. The strategy here is similar to Proposition 2: 


Proposition 6. Suppose u is from the feasible domain and c is given by Propo- 
sition 3. Then the optimal value of d is determined by one of the two choices 
below: 

b?-c—€-q=(u—1)d or f?-(R")Ft-c=d. 


The proof is omitted here and can be found in our full version [25]. 


Example 5 (Fibonacci, Part 5). Remember that 


= 1 55 
H = ai | and the feasible domain of y is [0,1) U ene Ai 
We compute the tight choices of u and tight invariants. The equation (8) here is 
p- Z9 + 9H +10 _ Iu- 3)(u+ 3) 
Fp m 
a Ga a 5°) 


which has only one positive root u = 3. By Proposition 5 and Proposition 6, We 
get two invariants: 
w=0: —#,+2%2—-10 <0; 
= 5/3: — 2a, — 3z2 +5 < 0. 


4.4 Loops with Guard: Multi-constraint Case 


After settling the single-constraint loop guard case, we consider the more general 
loop guard which contains the conjunction of multiple affine constraints: 


initial condition 6:R-x+f<0O (o) 
while P-x+q<0 do x’=T-x+b; end 


where the loop guard P -x +q < 0 contains m affine inequalities. 
We can easily generalize the results of Sect. 4.3 to this case. First of all, we 
generalize Proposition 3: one simply needs to modify the formula (6) into 


c=(T'—yp-I)'P'-€ with €>0 (6’) 


here € is a free non-negative m-dimensional vector parameter. With a fixed yp, 
we take € to traverse all vectors in the standard basis {e1,...,em} to get m 
conjunctive invariants. 

Next, we generalize Proposition 4 which describes the feasible domain of pu: 
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Proposition 7. For the loop (0), the feasible domain of u is [0,1) U (K N 
[1, +00)), where K is the solution set to the following generalized compatibility 
condition: 

b™-c-q"-Ẹ<(p-1)d< (u - 1)". (RT) c 


substitute c by (6') and take € to traverse all vectors in the standard basis (in 
order for all constraints in the loop guard to be satisfied by the invariant), we 
have the above condition completely decoded as m conjunctive inequalities: 


u(y) = b" . (T7 — p I) PT -q7 
< w(u) = (u — DET (RT) (TT — p- 1)7tP? (7') 


where u(u), w(u) are two m-dimensional vector functions in u. The meaning of 
(7') is that the i-th component of u(u) is no larger than the i-th component of 
w(u) for all 1 <i <m; when m=1, it goes back to (7). 


At last, we consider the tight choices of u. The first idea comes up to mind 
is to repeat Proposition 5: setting A) = AÇ = 0 for arbitary € such that the 
generalized compatibility condition achieves equality, i.e., u(u) = w(u); however, 
this is the conjunction of m rational equations and probably contains no solution. 

Thus we use a different idea: recall that in the single-constraint case, the 
tight choices are also the (positive) boundary points of K along with 0; so we 
adopt this property as the definition in the multi-constraint case: 


Definition 1. For the loop (°”), the tight choices of u consist of O and the 
(positive) boundary points of the domain K defined in Proposition 7. 


The generalized compatibility condition (7’) contains m inequalities; at each 
(positive) boundary point of K, at least one inequality achieves equality and 
all other inequalities are satisfied (equivalently, A} = AÇ = 0 is achieved for at 
least one non-trivial evaluation of the free vector parameter €). This is indeed a 
natural generalization of Proposition 5. 


Example 6. We consider the loop: 


mil oandivienp hee ees he a Se 
01 T2 —1 
A 7 1 0 Ly —10 
while P-x+q= 0 -1l lz + 5 < 0 do 


e-e pee Bo) E] [2]: ea 


There is one eigenvalue u = 1 with geometric multiplicity 2; we solve three 
independent invariants from it: 


zı +z2 — 2 < 0, z1 + z2 — 2 > 0; —z1 + z2 <0. 
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Next we find out the other invariants from tight y’s. In this case (7’) read 
H <1 A SB <1 (when p > 1). Then K = (1, 9), J) = (1, 9] 
and the feasible domain of p is [0, 1) U (1, 32]. The tight choices are 0, Ż (taking 
£ to be [1,0]*, [0, 1]T respectively yields the two conjunctive invariants for each 


Lt): 
w=0:2,-10<0 A —z2—5 <0; 
w=10/9: —a1+1<0 A ag-1<0. 


5 Generalizations 


In this section, we extend our theory developed in Sect. 4 in two directions. For 
one direction, we consider the invariants cT -x +d < 0 for the affine loops in 
the general form (f): we will derive the relationship of u and c, as well as the 
feasible domain and tight choices of u. For the other direction, we stick to the 
single-branch affine loops with deterministic updates and tautological guard (0), 
yet generalize the invariants to bidirectional-inequality form dı < cT -x < dz; we 
will apply eigenvalue method to this case for solving the invariants. At the end of 
the section, we also give a brief discussion on some other possible generalizations. 


5.1 Affine Loops with Non-deterministic Updates 


In Sect. 4, we handled the loops with deterministic updates; here we generalize 
the results to the non-deterministic case in the form of (t). We focus on the single- 
branch loops here, because the multi-branch ones can be handled similarly by 
taking the conjunction of all branches, as illustrated in the full version of this 
paper [25, Appendix A.3]. 


initial condition 6:R-x+f<0 (+) 
while P-x+q<0 do T-x—T’-x’+b<0; end 


For this general form, the initiation constraints are still (1) (2) (3), while the 
consecution constraints from Farkas table (*) are 


p-e+P?.€4T.n=0 (9) 

=F)? =C (10) 

(u—Ld+q™-E+b*-n=r5 >20 (11) 

with €,7 > 0. The relationship of c and 77 is given by (10); plugging it into (9) 


yield 
(TT -p (1) -n+PT-E=0. (9") 


Hence for any non-trivial invariant cT -x +d < 0 of this loop (t), we have 
c = —(T’)™-», where n is characterized differently in the following three cases: 
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1. T and T’ are square matrices and the loop guard is ‘true’. In this case, we take 
€ = 0 in (9’) and easily see that u must be a root of det (TT — u- (T’)) =0 
and 7 is a kernel vector of the matrix TT — u- (T’)?. 

2. T and T’ are square matrices and the loop guard is non-tautological. In this 
case, we set u to be values other than the roots of det (TT — u - (T’)") = 0, 
thus the inverse matrix (TT — p- ry exists; we multiply it on (9’) and 
get that n(u) = —(TT — u- (T')T) PT È. 

3. Neither T nor T’ is square matrix. In this case, we need to use Gaussian 
elimination method (with parameters) to solve (9’). By linear algebra, the 
solution ņn(u) would contain ‘homogeneous term’ (which does not involve € 
but possibly some free variables 7 = |m, ..., m] T) and ‘non-homogeneous term’ 
(which contains € linearly). Thus 7(4) could be written in parametric vector 
form as M(y)-77+ N(w)- €, where M(u), N(w) are matrix functions only in 
LL. 

For Case 2 and Case 3, we have a continuum of candidates for u. The feasible 
domain of yu is given by Q 1) U (K Nn (1, +00))) N J, where K is the solution 


set to the following compatibility condition (obtained by combining constraints 


(3”) (11)): 
bT- n(u) +q" -E > (w—1)f"- (RDE (TYT - n(n) 


and J is the solution set to constraints n(j) > 0. Here both 7 and € as free 
non-negative vector parameters are taken to traverse all standard basis vectors, 
just in the same way as Proposition 7. The tight choices of consists of 0 and 
the positive boundary points of K N J, in the same sense as Definition 1. 


5.2 An Extension to Bidirectional Affine Invariants 


Here we restrict ourselves to single-branch affine loops with deterministic updates 
and tautological loop guard (©), but aim for the invariants of bidirectional- 
inequality form dı < cT -x < dg. This is actually the conjunction of two affine 
inequalities: Sı : —cT - x + dı <0 A Ba : cT - x — dz < 0. We have the following 
proposition: 


Proposition 8. For any bidirectional invariant dı < cT -x < dz of the loop (©), 
we have that c must be an eigenvector of TT with a negative eigenvalue. 


Proof. We can easily write down the initiation condition: 0 = (©; \ 82) and the 


corresponding constraints (with A, À being two different vector parameters): 
R™.A=c, fT-A+d =A >0; R™-A=-c, fT-A-d =) >. 
However, there are two possible ways to propose the consecution condition: 


(i ATE, and 2 ArT H8) or (PATE ® and ATES) 
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If we choose the first one, there will be nothing different from the things we did 
in Sect. 4.2. Thus we choose the second one: making the two inequalities induct 
each other. Hence the Farkas tables are 


u |—cT.x +d, <0 (81) jfile™-x — dy < 0 (®2) 

AG —1<0 rg —1<0 

—c| T-x — x’ +b=0 (t) clT-x—- x’ +b=0 (7) 
| cT. x’ — do < 0 (&) | — cT. x +d, <0 (®) 


We write out the constraints of consecution: 


—p-c=T!-c=-fi-c (12) 
udi +d2—-b"-c=AF>0, =f dp dy BT = AF >0 


the proposition is verified by (12) since u, 4 > 0. 


Example 7 (Fibonacci, Part 6). Recall that in this example we have a negative 
eigenvalue 1/6 It yields the eigenvector c = [cy, IVE oT. The other con- 
straints are computed as: 


-(3 — V5)c1/2+ d2 = à} > 0, (3 - v5)ca/2-— dı =r > 0. 
-(1 — vV5)d/2+d2 = AÇ > 0, (1— v5)d2/2— dı = AS > 0. 


If we choose c1 = 2, \} = 0 = AG (or cy = —2, A5 = 0 = AG), we get an invariant 


u= |(1 — v5)/2| : 2(2 — V5) < 22, + (1 — vV5)x2 < 3 — V5 


which reflects the ‘golden ratio’ property of the Fibonacci numbers. 


Remark 1. The generalizations for bidirectional affine invariants to the loops 
with non-tautological guard or multiple branches are practicable but with some 
restrictions. The main restriction lies at the point that we need to assume the 
affine loop guard to also be bidirectional to make our approach for bidirectional 
affine invariants work. The issue of multiple branches is not critical as the bidi- 
rectional invariants can be derived in almost the same way as single-inequality 
invariants (illustrated in full version [25, Appendix A.3]), with the only difference 
at the adaption to bidirectional inequalities. 


5.3 Other Possible Generalizations 


Integer-valued Variables. One direction is to transfer some of the results for 
affine loops over real-valued variables to those over integer-valued variables. Our 
approach is based on Farkas’ lemma which is dedicated to real-valued variables, 
thus can only provide a sound but not exact treatment for integer-valued vari- 
ables. An exact treatment for integer-valued variables would require Presburger 
arithmetics [16], rather than Farkas’ lemma. 
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Strict-inequality Invariants. We handle the non-strict-inequality affine 
invariants in this work. It’s natural to consider the affine invariants of the strict- 
inequality form. For strict inequalities, we could utilize an extended version of 
Farkas’ lemma in [6, Corollary 1], so that strict inequalities can be generated by 
either relaxing the non-strict ones obtained from our method or restricting the u 
value to be positive. Since Motzkin transposition theorem is a standard theorem 
for handling strict inequalities, we believe that Motzkin transposition theorem 
can also achieve similar results, but may require more tedious manipulations. 


6 Approximation of Eigenvectors through Continuity 


In Sect. 4.2 and Sect. 5.2, we need to solve the characteristic polynomial of the 
transition matrix to get eigenvalues; while general polynomials with degree > 5 
do not have algebraic solution formula due to Abel-Ruffini theorem. We can 
develop a number sequence {\;} to approximate the eigenvalue \ through root- 
finding algorithms; however, we cannot approximate the eigenvector of by 
solving the kernel of TT — A; -I since it has trivial kernel. In the case of dimensions 
> 5, i.e., when an explicit formula for eigenvalues is unavailable, we introduce 
an approximation method of the eigenvectors through a continuity property of 
the invariants: 


Continuity of Invariants w.r.t. u. In Sect.4, we have shown that for any 
invariant cT -x+ d < 0 of single-branch affine loops with deterministic updates, 
the relationship of c and p is given in two ways: 

_ J kernel vector of TT — -I when det(T? — p- I) =0 

Cr ged! se when det(T? — u- I) 40 
with z = PT . €. Thus c = c() could be seemed as a vector function in p 
expressed differently at eigenvalues from other points. c(j) is undoubtedly con- 


tinuous at the points other than eigenvalues, while the following proposition 
illustrates the continuity property of c(u) at the eigenvalues: 


Proposition 9. Suppose À is a real eigenvalue of TT with eigenvector c(A); and 
{\;} is a sequence lying in the feasible domain of p which converges to A. If A 
has geometric multiplicity 1, then the sequence {c(A;)} converges to c(A) as well; 
otherwise, {c(A;)} converges to 0. 


Due to the lack of space, the proof of Proposition 9 is omitted here and available 
in our full version [25]. 


An Algorithmic Approach to Eigenvalue Method in Dimensions > 5. 
By Proposition 9, if A has geometric multiplicity 1, we can compute c(A;) = 
(TT — \;-I)~!-z (in the case of tautological loop guard, we just replace z by 
any non-zero n-dimensional real vector) to approximate the eigenvector c()). 
On the other hand, in the case that A has geometric multiplicity > 1, one can 
adopt Least-squares approximation as presented in [5, Section 8.9]. Though the 
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Least-squares approximation applies to the cases of eigenvalues with arbitrary 
geometric multiplicity, our method is much easier to implement and has higher 
efficiency. 


7 Experimental Results 


Experiment. We implement our automatic invariant-generation algorithm of 
eigenvalues and tight choices in Python 3.8 and use Sage [42] for matrix manip- 
ulation. All results are obtained on an Intel Core i7 (2.00 GHz) machine with 
64GB memory, running Ubuntu 18.04. Our benchmarks are affine loops chosen 
from some benchmark in the StInG invariant generator [40], some linear dynam- 
ical system in [30], some loop programs in [41] and some other linear dynamical 
systems resulting from well-known linear recurrences such as Fibonacci numbers, 
Tribonacci numbers, etc. 


Complexity. The main bottleneck of our algorithm lies at exactly solving or 
approximating real roots of univariate polynomials (for computing eigenvalues 
and boundary points in our algorithmic approach). The rest includes Gaussian 
elimination with a single parameter (the polynomial-time solvability of which is 
guaranteed by [26]), matrix inverse and solving eigenvectors with fixed eigenval- 
ues, which can easily be done in polynomial time. The exact solution for degrees 
less than 5 can be done by directly applying the solution formulas. The approxi- 
mation of real roots can be carried out through real root isolation and a further 
divide-and-conquer (or Newton’s method) in each obtained interval, which can 
be completed in polynomial time (see e.g. [36] for the polynomial-time solvability 
of real root isolation). Thus, our approach runs in polynomial time and is much 
more efficient than quantifier elimination in [12]. 


Results. The experimental results are presented in Table1. In the table, the 
column ‘Loop’ specifies the name of the benchmark, ‘Dim(ension)’ specifies the 
number of program variables, ‘w specifies the values through eigenvalues of the 
transition matrices (which we marked with e) or boundary points of the intervals 
in the feasible domain, ‘Invariants’ lists the generated affine invariants from our 
approach. We compare our approach with the existing generators StInG [40] 
and InvGen [22], where ‘=’, ‘>’, ‘>’ and ‘4’ means the generated invariants are 
identical, more accurate, can only be generated in this work, and incomparable, 
respectively. Table2 compares the amounts of runtime for our approach and 
StInG and InvGen respectively, measured in seconds. Note that the runtime of 
StInG and InvGen are obtained by executing their binary codes on our platform. 


Analysis. StInG [40] implements constraint-solving method proposed in [12,37], 
InvGen [22] integrates both constraint-solving method and abstract interpre- 
tation, while our approach uses matrix algebra to refine and upgrade the 
constraint-solving method. Based on the results in Table 1 and Table 2, we con- 
clude that: 
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Table 1. Experimental Results of Invariants 


Loop Dim H Invariants [40] | [22] 
Fibonacci numbers 2 I(1 — v5)/2lļe 22, + (1 — vV5)z2 -3 + V5 < 0 >| > 
22, — (1 — V5)ag +4 — 2V5 < 0 
(1+ V5)/2, 2a, — (1+ V5)ag +3 + V5 <0 
—22, — (1 + vV5)z2 < 0 
See-Saw [40] 2 1, zı — 2x2 <0 = > 
—321 + z2 <0 
Example 6.2 [30] 4 I1 — 4/2] w-— y- (1 -— v2) + (1-— vV2)z <0 | >| > 
14+ v2, w — y — (1 + vV2)x + (1 + V2)z <0 
css2003 [41] 3 0, le i-L! <0 =|= 
=i+1<0,c4k=1=0 
afnp2014 [41] 2 0, le, 1000/999 y—999 <0 = > 
—y <0,7-—999y -1<0 
gsv2008 [41] 2 0, 1e, 8/7 2—yt2<0 >| # 
—y < 0, —x — Ty — 50 < 0 
cggmp2005 [41] 2 O, Le, 4/3 i—-j—-3<0,-i+1<0,j-10<0/> [> 
i+2j -21=0,-i1+j3-9<0 
Jacobsthal numbers 2 |-lle,2e 221 — T2 1 < 0, —2z1 +22 1<0| > > 
—zı —%2+2<0 
Pell numbers 2 [1 — V2l- zı + (1 — V2)zr2 — 3 + 2V2 < 0 > > 
zı — (1 — V2)rg +7 — 5V2 < 0 
14/2, zı — (1 + V2)" +342V2<0 
-z1 — (1 + V2)r2 < 0 
Perrin numbers 3 A= i) eae a = S444 b=1/a+1 > > 
w= 4A, £1 + beg +003 > 3, +2A43 
Tribonacci numbers 3 A 9/3/33 + 19 a 4(4 i 4 +1),b=1/a+1 > > 
u = (54 + 1)/3e zı +bz2 +arg >b+a 


1 L stands for the variable LARGE_INT in the original program [41]. Note that we modified the loop 
programs in [41] as affine loops before execution. 


Table 2. Experimental Results of Execution Time (s) 


Loop StInG [40] InvGen [22] Our Approach 

Fibonacci numbers 0.030 0.079 0.178 
See-Saw [40] 0.024 0.104 0.104 
Example 6.2 [30] 0.030 0.092 0.173 
css2003 [41] 0.019 0.111 0.193 
afnp2014 [41] 0.025 0.076 0.193 
gsv2008 [41] 0.027 0.092 0.207 
cggmp2005 [41] 0.026 0.111 0.184 
Jacobsthal numbers} 0.026 0.085 0.193 
Pell numbers 0.023 0.102 0.219 
Perrin numbers 0.031 0.129 0.250 
Tribonacci numbers 0.029 0.115 0.262 
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For the benchmarks with rather simple transition matrices (identity or diag- 
onal matrices), our approach covers or outnumbers the invariants generated 
by StInG and InvGen. 

For the benchmarks with complicated transition matrices (which are the 
matrices far away from diagonal ones), especially the ones with irrational 
eigenvalues, our approach generates adequate accurate invariants while StInG 
and InvGen generate nothing or only trivial invariants. 

For all benchmarks, the runtime of StInG and InvGen are faster but compa- 
rable with our runtime, hence shows the efficiency of our approach. 


Summarizing all above, the experimental results demonstrate the wider cover- 
age for the u value endowed from our approach, and show the generality and 
efficiency of our approach. 
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Abstract. We propose a data-driven algorithm for numerical invariant 
synthesis and verification. The algorithm is based on the ICE-DT schema 
for learning decision trees from samples of positive and negative states 
and implications corresponding to program transitions. The main issue 
we address is the discovery of relevant attributes to be used in the learn- 
ing process of numerical invariants. We define a method for solving this 
problem guided by the data sample. It is based on the construction of a 
separator that covers positive states and excludes negative ones, consis- 
tent with the implications. The separator is constructed using an abstract 
domain representation of convex sets. The generalization mechanism of 
the decision tree learning from the constraints of the separator allows the 
inference of general invariants, accurate enough for proving the targeted 
property. We implemented our algorithm and showed its efficiency. 


Keywords: Invariant synthesis - Data-driven program verification 


1 Introduction 


Invariant synthesis for program safety verification is a highly challenging prob- 
lem. Many approaches exist for tackling this problem, including abstract inter- 
pretation, CEGAR-based symbolic reachability, property-directed reachability 
(PDR), etc. [3,5,6,8, 10,14, 17,19]. While those approaches are applicable to large 
classes of programs, they may have scalability limitations and fail to infer cer- 
tain types of invariants, such as disjunctive invariants. Emerging data-driven 
approaches, following the active learning paradigm with various machine learn- 
ing techniques, have shown their ability to solve efficiently complex instances 
of the invariant synthesis problem [12,15, 16,20, 26, 30,31]. These approaches are 
based on the iterative interaction between a learner inferring candidate invari- 
ants from a data sample, i.e., a set of data classified either as positive examples, 
known to be reachable from the initial states and that therefore must be included 
in any solution, or negative examples, known to be predecessors of states violat- 
ing the safety property and that therefore cannot be included in any solution, 
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and a teacher checking the validity of the proposed solutions and providing coun- 
terexamples as feedback in case of non-validity. One such data-driven approach 
is ICE [15] which has shown promising results with its instantiation ICE-DT [16] 
that uses decision trees for the learning component. ICE is a learning approach 
tailored for invariant synthesis, where the feedback provided by the teacher can 
be, in addition to positive and negative examples, implications of the form p > q 
expressing the fact that if p is in a solution, then necessarily q should also be 
included in the solution since there is a transition in the program from p to q. 

The strength of data-driven approaches is the generalization mechanisms of 
their learning components, allowing them to find relevant abstractions from a 
number of examples without exploring the whole state space of the program. In 
the case of ICE-DT, this is done by a sophisticated construction of decision trees 
classifying correctly the known positive and negative examples at some point, 
and taking into account the information provided by the implications. These 
decision trees, where the tested attributes are predicates on the variables of the 
program, are interpreted as formulas corresponding to candidate invariants. 

However, to apply data-driven methods such as ICE-DT, one needs to have 
a pool of attributes that are potentially relevant for the construction of the 
invariant. This is actually a crucial issue. In ICE-DT, as well as in most data- 
driven methods, finding the predicates involved in the invariant construction 
is based on systematic enumeration of formulas according to some pre-defined 
templates or grammars. For instance, in the case of numerical programs, the 
considered patterns are some special types of linear constraints, and candidate 
attributes are generated by enumerating all possible values for the coefficients 
under some fixed bound. While such a brute-force enumeration can be effective 
in many cases, it represents, in general, an obstacle for both scalability and 
finding sufficiently accurate inductive invariants in complex cases. 

In this paper, we provide an algorithmic method for efficient generation of 
attributes for data-driven invariant synthesis for numerical programs manipulat- 
ing integer variables. While enumerative approaches are purely syntactic and do 
not take into account the data sample, our method is guided by it. We show that 
this method, when integrated in the ICE-DT schema, leads to a new invariant 
synthesis algorithm outperforming state-of-the-art methods and tools. 

Our method for attributes discovery is based on, given an ICE data sample, 
computing a separator of it as a union of convex sets i.e., (1) it covers all the 
positive examples, (2) it does not contain any negative example, and (3) it is 
consistent with the implications (for every p — q in the sample, if the separator 
contains p, then it should also contain q). Then, the set of attributes generated is 
the set of all constraints defining the separator. However, as for a given sample 
there might be several possible separators, a question is which separators to 
consider. Our approach is guided by two requirements: (1) we need to avoid big 
pools of attributes in order to reduce the complexity of the invariant construction 
process, and (2) we need to avoid having in the pool constraints that are (visibly) 
unnecessary, e.g. separating positive examples in a region without any negative 
ones. Therefore, we consider separators that satisfy the property that, whenever 
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they contain two convex sets, it is impossible to take their convex union (smallest 
convex set containing the union) without including a negative example. 

To represent and manipulate algorithmically convex sets, we consider 
abstract domains, e.g., intervals, octagons, and polyhedra, as they are defined in 
the abstract interpretation framework and implemented in tools such as APRON 
[18]. These domains correspond to particular classes of convex sets, defined by 
specific types of linear constraints. In these domains, the union operation is 
naturally over-approximated by the join operation that computes the best over- 
approximation of the union in the considered class of convex sets. Then, con- 
structing separators as explained above can be done by iterative application of 
the join operation while it does not include negative examples. 

Then, this method for generating candidate attributes can be integrated into 
the ICE-DT schema: in each iteration of ICE loop, given a sample, the learner (1) 
generates a set of candidate attributes from a separator of the sample, (2) builds 
a decision tree from these attributes and proposes it as a candidate invariant 
to the teacher. Then, the teacher (1) checks that the proposed solution is an 
inductive invariant, and if it is not (2) provides a counterexample to the learner, 
extending the sample that will be used in the next iteration. 

Here a question might be asked: why do we need to construct a decision tree 
from the constraints of the separator and do not propose directly the formula 
defining the separator as a candidate invariant to the teacher. The answer is 
that the decision tree construction is crucial for generalization. Indeed, given a 
sample, the constructed separator might be too specialized to that sample and 
does not provide a useful inductive invariant (except for some simple cases). For 
instance, the constructed separator is a union of bounded convex sets (polytopes), 
while invariants are very often unbounded convex sets (polyhedra). The effect 
of using decision trees, in this case, is to select the relevant constraints and 
discard the unnecessary bounds, leading very quickly to an unbounded solution 
that is general enough to be an inductive invariant. Without this generalization 
mechanisms, the ICE loop will not terminate in such (quite common) cases. 

The integration of our method can be made tighter and more efficient by 
making the process of building separators incremental along the ICE iterations: 
at each step, after the extension of the sample by the teacher, instead of con- 
structing a separator of the new sample from scratch, the parts of previously 
computed separators not affected by the last extension of the sample are reused. 

We have implemented our algorithm and carried out experiments on the 
SyGuS-Comp’19 benchmarks. Our method solves significantly more cases than 
the tools LoopInvGen [25,26], CVC4 [1,27], and Spacer [19], as well as our imple- 
mentation of the original ICE-DT [16] algorithm (with template-based enumer- 
ation of attributes), with very competitive time performances. 


Related Work. Many learning-based approaches for the verification of numer- 
ical programs have been developed recently. One of the earliest approaches is 
Daikon [11]. Given a pool of formulas, it computes likely invariants from program 
executions. Later approaches were developed for the synthesis of sound invari- 
ants, for example [30] iteratively generates a set of reachable and bad states and 
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classifies them with a combination of half-spaces computed using SVM. In [29], 
the problem is reformulated as learning geometric concepts in machine learning. 
The first instantiation of the ICE framework was based on a constraint solver 
[15]. Later on, it was instantiated using the decision trees learning algorithm [16]. 
Both those instantiations require a fixed template for the invariants or the for- 
mulas appearing in them. LoopInvGen enumerates predicates on-demand using 
the approach introduced in [26]. This is extended to a mechanism with hybrid 
enumeration of several domains or grammars [25]. Continuous logic networks 
were also used to tackle the problem in CLN2INV [28]. Code2Inv [31], the first 
approach to introduce general deep learning methods to program verification, 
uses a graph neural network to capture the program structure and reinforcement 
learning to guide the search heuristic of a particular domain. 

The learning approach of ICE and ICE-DT has been generalized to solve 
problems given as constrained horn clauses (CHC) in Horn-ICE [12] and HoICE 
[4]. Outside the ICE framework, [33] proposed a learning approach for solving 
CHC using decision trees and SVM for the synthesis of candidate predicates from 
a set of reachable and bad states of the program. The limitation of the non-ICE- 
based approach is that when the invariant is not inductive, the program has to 
be rerun, forward and backward, to generate more reachable and bad states. 

In more theoretical work, an abstract learning framework for synthesis, 
introduced in [21], incorporates the principle of CEGIS (counterexample-guided 
inductive synthesis). A study of overfitting in invariant synthesis was conducted 
n [25]. ICE was compared with IC3/PDR in terms of complexity in [13]. A 
generalization of ICE with relative inductiveness [32] can implement IC3/PDR 
following the paradigm of active learning with a learner and a teacher. 

Automatic invariant synthesis and verification has been addressed by many 
other techniques based on exploring and computing various types of abstract 
representations of reachable states (e.g., [3,5,6,8,10,14,17,19]). Notice that, 
although we use abstract domains for representation and manipulation of convex 
sets, our strategy for exploring the set of potential invariants is different from 
the ones used typically in abstract interpretation analysis algorithms [8]. 


2 Safety Verification Using Learning of Invariants 


This section presents the approach we use for solving the safety verification prob- 
lem. It is built upon the ICE framework [15] and in particular its instantiation 
with the learning of decision trees [16]. We first define the verification problem. 


2.1 Linear Constraints and Safety Verification 


Let X be a set of variables. Linear formulas over X are boolean combinations 
of linear constraints of the form aw aixi < b where the x;’s are variables in 
X, the a;’s are integer constants, and b € ZU {+00}. We use linear formulas to 
reason symbolically about programs with integer variables. Assume we have a 
program with a set of variables V and let n = |V|. A state of the program is a 
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vector of integers in Z”. Primed versions of these variables are used to encode 
the transition relation T of the program: for each v € V, we consider a variable 
v’ to represent the value of v after the transition. Let V’ be the set of primed 
variables, and consider linear formulas over V U V” to define the relation T. 
The safety verification problem consists in, given a set of safe states Good, 
deciding whether, starting from a set of initial states Init, all the reachable states 
by iterative application of T are in Good. Dually, this is equivalent to decide if 
starting from Init, it is possible to reach a state in Bad which is the set of unsafe 
states (the complement of Good). Assuming that the sets Init and Good can be 
defined using linear formulas, the safety verification problem amounts to find an 
adequate inductive invariant I, such that the three following formulas are valid: 


Init(V) => I(V) (1) 
I(V) = Good(V) (2) 
I(V)AT(V, V’) = I(V’) (3) 


We are looking for inductive invariants which can be expressed as a linear 
formula. In that case, the validity of the three formulas is decidable and can be 
checked with a standard SMT solver. 


2.2 The ICE Learning Framework 


ICE [15] follows the active learning paradigm to learn adequate inductive invari- 
ants of a given program and a given safety property. It consists of an iteratively 
communicating learner and a teacher (see Algorithm 1). 


Input : A transition system and a property: (Init, T, Good) 
Output: An adequate invariant or error 
1 initialize ICE-sample S = (St, S~,S~); 
2 while true do 
3 J — LEARN(S); 
(success, counterexample) — IS INDUCTIVE(J); 
if success then return J ; 
else 
S — UPDATE(S, counterexample); 
if contradictory(S) then return error; 


w% N anA 


Algorithm 1: The main loop of ICE. 


In each iteration, in line 3, the learner, which does not know anything about 
the program, synthesizes a candidate invariant (as a formula over the program 
variables) from a sample S (containing information about program states) which 
is enriched during the learning process. Contrary to other learning methods, the 
sample § not only contains a set of positive states St which should satisfy the 
invariant, and a set of negative states S~ which should not satisfy the invariant, 
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but it contains also a set of implications S~ of the form s — s’ meaning that 
if s satisfies the invariant, then s’ should satisfy it as well (because there is a 
transition from s to s’ in the transition relation of the program). Therefore, an 
ICE-sample S$ is a triple (S*,S~,S~), where to account for the information 
contained in implications, it is imposed additionally that 


Ys = s € S7 : ifs € St, then s’ € St, and if s' € S7, then se ST (4) 


The sample is initially empty (or containing some states whose status, positive or 
negative, is known). It is assumed that a candidate invariant J proposed by the 
learner is consistent with the sample, i.e. states in S* satisfy the invariant J, the 
states in S7 falsify it, and for implications s — s’ € S~ it is not the case that s 
satisfies J but not s’. Given a candidate invariant J provided by the learner in 
line 3, the teacher who knows the transition relation T, checks if J is an inductive 
invariant in line 4; if yes, the process stops, an invariant has been found; otherwise 
a counterexample is provided and used in line 7 to update the sample for the 
next iteration. The teacher checks the three conditions an inductive invariant 
must satisfy (see Sect.2.1). If (1) is violated the counterexample is a state s 
which should be in the invariant because it is in Init. Therefore s is added to 
St. If (2) is violated the counterexample is a state s which should not be in 
the invariant because it is not in Good and s is added to S7. If (3) is violated 
the counterexample is an implication s — s’ where if s is in the invariant, s’ 
should also be in it. Therefore s — s’ is added to S~. In all three cases, the 
sample is updated to satisfy property 4. If this leads to a contradictory sample, 
ie. ST N ST #0, the program is incorrect and an error is returned. Notice that 
obviously, in general, the loop is not guaranteed to terminate. 


2.3 ICE-DT: Invariant Learning Using Decision Trees 


In [16], the ICE learning framework is instantiated with a learn method, which 
extends classical decision tree learning algorithms with the handling of impli- 
cations. In the context of invariant synthesis, decision trees are used to classify 
points from a universe, which is the set of program states. They are binary trees 
whose inner nodes are labeled by predicates from a set of attributes and whose 
leaves are either + or —. Attributes are (atomic) formulas over the variables 
of the program. They can be seen as boolean functions that the decision tree 
learning algorithm will compose to construct a classifier of the given ICE sample. 
In our case of numerical programs manipulating integer variables, attributes are 
linear inequalities. Then, a decision tree can be seen naturally as a quantifier-free 
formula over program variables. 

The main idea of the ICE-DT learner (see Algorithm 2) is as follows. Ini- 
tially, the learner fixes a set of attributes (possibly empty) which is kept in a 
global variable and updated in successive executions of LEARN(S). In line 2, 
given a sample, the learner checks whether the current set of attributes is suf- 
ficient to produce a decision tree corresponding to a formula consistent with 
the sample. If the check is successful the sample S is changed to S44, taking 
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Input : An ICE sample S = (S+, S7, S7) 
Output: A formula 
Global : Attributes initialized with Initial Attributes 
1 Proc LEARN(S) 
(success, S attr) — SUFFICIENT( Attributes, S); 
while =success do 
Attributes — GENERATEATTRIBUTES(Attributes, S); 
(success, S attr) — SUFFICIENT(Attributes, S); 
return tree_to_formula(CONSTRUCT-TREE(S attr, Attributes) ) 


aa kwN 


Algorithm 2: The ICE-DT learner LEARN(S) procedure. 


into account information gathered during the check (see below for the details of 
SUFFICIENT( Attributes, S')). If the check fails new attributes are generated with 
GENERATEATTRIBUTES( Attributes, S) until success. Then, a decision tree is con- 
structed in line 6 from the sample SAttr by CONSTRUCT-TREE(S Attr, Attributes) 
which we present below (Algorithm 3). It is transformed into a formula and 
returned as a potential invariant. Notice that in the main ICE loop of Algo- 
rithm 1 the teacher then checks if this invariant is inductive or not. If not, the 
original sample S is updated and in the next iteration the learner checks if the 
attributes are still sufficient for the updated sample. If not, the learner generates 
new attributes and proceeds with constructing another decision tree and so on. 

An important question is how to choose Initial Attributes and how to gener- 
ate new attributes when needed. In [16], the set Initial Attributes is for example 
the set of octagons over program variables with absolute values of constants 
bounded by c € N. If these attributes are not sufficient to classify the sample, 
then new attributes are generated simply by increasing the bound c by 1. We use 
a different method described in detail in Sect. 4. We now describe how a decision 
tree can be constructed from an ICE sample and a set of attributes. 


Decision Tree Learning Algorithms. The well-known standard decision tree 
learning algorithms like ID3 [23] take as an input a sample containing points 
marked as positive or negative of some universe and a fixed set Attributes. 
They construct a decision tree by choosing as the root an attribute, splitting 
the sample in two (one with all points satisfying the attribute and one with the 
other points) and recursively constructing trees for the two subsamples. At each 
step the attribute maximizing the information gain computed using the entropy 
of subsamples is chosen. Intuitively this means that at each step, the attribute 
which separates the “best” positive and negative points is chosen. In the context 
of verification, exact classification is needed, and therefore, all points in a leaf 
must be classified in a way consistent with the sample. 

In [16] this idea is extended to handle also implications which is essential for 
an ICE learner. The basic algorithm to construct a tree (given as Algorithm 3 
below) gets as input an ICE sample S = (S+, 87,57) and a set of Attributes 
and produces a decision tree consistent with the sample, which means that each 
point in S* (resp. S7) is classified as positive (resp. negative) and for each 
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implication (s,s’) € S~ it is not the case that s is classified as positive and s’ 
as negative. The initial sample S is supposed to be consistent. 


Input : An ICE sample S = (St, S~,S~) and a set of Attributes. 
Output: A tree 
Proc ConstRuCT-TREE(S, Attributes) 


1 
2 Set G (partial mapping of end-points of impl. to {POSITIVE, NEGATIVE} ) to empty ; 
3 Let Unclass be the set of all end-points of implications in S~; 
4 Compute the implication closure of G w.r.t. S; 
5 return DECISIONTREEICE((S*, S~ , Unclass), Attributes); 
6 Proc DECISIONTREEICE(Examples = (Pos, Neg, Unclass), Attributes) 
7 Move all points of Unclass classified as POSITIVE (resp. NEGATIVE) to Pos (resp. Neg); 
8 if Neg = 0 then 
9 Mark all points of Unclass in G as POSITIVE; 
10 Compute the implication closure of G w.r.t. S; 
11 return Leaf(+); 
12 else if Pos = 9 then 
13 Mark all points of Unclass in G as NEGATIVE; 
14 Compute the implication closure of G w.r.t. S; 
15 return Leaf(—); 
16 else 
17 a +— CHOOSE(Attributes, Examples); 
18 Divide Examples into two: Examples, with all points satisfying a and 
Examplesiaq the others; 
19 The ft — DECISIONTREEICE(Examples,, Attributes \ {a}); 
20 Trignt — DECISIONTREEICE(Examples—a, Attributes \ {a}); 
21 return Tree(a, The ft, Tright); 


Algorithm 3: The ICE-DT decision-tree learning procedures. 


The learner is similar to the classical decision tree learning algorithms. How- 
ever, it has to take care of implications. To this end, the learner also considers 
the set of points appearing as end-points in the implications but not in St and 
S~. These points are considered in the beginning as unclassified, and the learner 
will either mark them POSITIVE or NEGATIVE during the construction as follows: 
if in the construction of the tree a subsample is reached containing only positive 
(resp. negative) points and unclassified points (lines 8 and 12 resp.), all these 
points are classified as positive (resp. negative). To make sure that implications 
are still consistent, the implication closure with the newly classified points is 
computed and stored in the global variable G, a (partial mapping) of end-points 
in S~ to {PosITIVE, NEGATIVE}. The implication closure of G w.r.t. S is defined 
as: If G(s) = POSITIVE or s € ST and (s,s’) € S~ then also G(s’) = POSITIVE. 
If G(s’) = NEGATIVE or s’ € S7 and (s,s’) € S~ then also G(s) = NEGATIVE. 

The set Attributes is such that a consistent decision tree will always be 
found, i.e. the set Attributes in line 17 is never empty (see below). An attribute 
in a node is chosen with CHOOSE( Attributes, Examples) returning an attribute 
a € Attributes with the highest gain according to Examples. We do not give 
the details of this function. In [16] several gain functions are defined extending 
the classical gain function based on entropy with the treatment of implications. 
We use the one which penalizes cutting implications (like ICE-DT-penalty). 


Checking if the Set of Attributes is Sufficient. Here we show how the 
function SUFFICIENT( Attributes, S) of Algorithm 2 is implemented in [16]. Two 
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states s and s’ are considered equivalent (denoted by = attributes), if they satisfy 
the same attributes of Attributes. One has to make sure that two equivalent 
states are never classified in different ways by the tree construction algorithm. 
This is done by the following procedure: For any two states s, s’ with s = attributes 
s’ which appear in the sample (as positive or negative or end-points of the 
implications) two implications s > s’ and s’ > s are added to S? of S. 

Then, the implication closure of the sample is computed starting from an 
empty mapping G (all end-points are initially unclassified). If during the com- 
putation of the implication closure one end-point is classified as both POSITIVE 
and NEGATIVE, then SUFFICIENT( Attributes, S) returns (false, S) else it returns 
(true, Satter) where SAttr is obtained from S = (St,S~,S~) by adding to ST 
the end-points of implications classified as POSITIVE and to S~ the end-points 
classified as NEGATIVE. 

In [16] it is shown that this guarantees in general that a tree consistent with 
the sample will always be constructed regardless of the order in which attributes 
are chosen. We illustrate now the ICE-DT learner on a simple example. 


Example 1. Let S = (St,S~,S~) be a sample (illustrated in Fig. 1) with two- 
dimensional states (variables x and y): St = {(1,1), (1,4), (3,1), (5,1), (5,4), 
(6,1), (6,4)}, ST = {(4, 1), (4,2), (4,3), 44}, S7? = {(2,2) > (2,3), (0,2) > 
(4,0)}. We suppose that Attributes = {x > 1,x < 3,y > 1,y < 4,x > 5,x < 6} 
is given. In Sect. 4 we show how to obtain this set from the sample. The learner 
first checks that the set Attributes is sufficient to construct a formula consistent 
with S. The check succeeds and we have among others that (2,2) and (2,3) and 
the surrounding positive states on the left are all equivalent w.r.t. = Attributes: 
Therefore after adding implications (which we omit for clarity in the follow- 
ing) and the computation of the implication closure both (2,2) and (2,3) are 
added to S+. Then, the construction of the tree is started with Examples con- 
taining 9 positive, 4 negative and 2 unclassified states. Depending on the gain 
function an attribute is chosen. Here, it is x > 5, since it separates all the pos- 
itive states on the right from the rest and does not cut the implication. The 
set Examples is split into the states satisfying x > 5 and those which don’t: 
Examplesr>5 and Examplesz<5. Examplesr>5 contains only positive states 
{(5, 1), (5,4), (6, 1), (6, 4)} and the branch is finished whereas Exampless<s con- 
tains the remaining positive, negative and unclassified states and the construc- 
tion continues. The attribute x < 3 is chosen and Examples,;<s split in two. 
Examplesr<saz<3 contains the positive states {(1, 1), (1, 4), (3, 1), (2, 2), (2,3)} 
and one unclassified state (0,2). Therefore, the algorithm marks (0, 2) as positive 
and as there is an implication (0,2) — (4,0), the state (4,0) is marked positive 
as well and a leaf node is returned. The other branch Examplesze5,z>3 NOW 
contains negative states {(4, 1), (4,2), (4,3), (4,4)} and a positive state (4,0). 
Therefore another attribute is needed. Finally, the algorithm returns a tree cor- 
responding to the formula x > 5V (x <5Aa%<3)V(a<5Aa>3Ay<1). 
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3 Linear Formulas as Abstract Objects 


Algorithm 2 requires a set of attributes as input. In Sect.4, we show how to 
generate these attributes from the sample. For that purpose, we use numerical 
abstract domains to represent and manipulate algorithmically sets of integer 
vectors representing program states. We consider standard numerical domains 
defined in [7,9,22] and implemented in tools such as APRON [18]: Intervals, 
Octagons, and Polyhedra. 

Given a set of n variables X and a linear formula y over X, let [p] C Z” be 
the set of all integer points satisfying the formula. Now, a subset of Z” is called 


— an interval, iff it is equal to [yp] where y is a conjunction of constraints of the 
form a <a < 6, where x € X, a € ZU {—oo} and BE ZU {+00}. 

— an octagon, iff it is equal to [y] where y is a conjunction of constraints of the 
form +a + y < a where x,y € X anda € ZU {+00}. 

— a polyhedra, iff it is equal to |y] where ¢ is a conjunction of linear constraints 
of the form X` ;_; aja; < b where X = {z1,..., £n} and for every i, a; € Z, 
and b € ZU {+00}. 


Now, we can define several abstract domains as complete lattices AYP g= 
(DYP EU, N, L, T), where type is either int, oct or poly and DY" is the set of 
intervals, DS¢" is the set of octagons and D WY the set of polyhedra. 

The relation E is set inclusion. The binary operation U (resp. M) is the join 
(resp. meet) operation that defines the smallest (resp. greatest) element in Dx 
that contains (resp. contained in) the union (resp. the intersection) of the two 
composed elements. Finally L (resp. T) corresponds to the empty set (resp. Z”). 

We suppose that we have a function Form'’?(d) which given an element 
d C Z” of the lattice provides us a formula y of the corresponding type such 
that [p] = d. There are many ways to describe the set d with a formula y. 
Therefore the function Form'?¢(d) depends on the particular implementation 
of the abstract domains. We furthermore define Constrt¥?*(d) to be the set of 
linear constraints of Form*¥?¢(d). 

We drop the superscript type from all preceding definitions, when it is clear 
from the context or when we define notions for all types. 

All singleton subsets of Z” are elements of the lattices and for example, if 
p= (x = 1,y = 2), then, for the domains of Intervals, Octagons, and Polyhedra 
as implemented in APRON we have: Constr’ ({p}) = {x <1,a>l,y<2,y> 
2}, Constr?“ ({p}) = {x > 1,x < 1,y— x > 1,z +y > 3,y > 2,y < 2,£ +y < 
3,2 — y > —1} and Constr? ({p}) = {x = 1,y = 2}. 

Notice, that in APRON while equality constraints are used in the Polyhedra 
domain, these constraints are not explicit in the Interval and Octagon domains. 

An important fact about the three domains mentioned above is that, each 
element of the lattice is the intersection of a convex subset of Q” with Z”. To 
be able to reason about integer points from nonconvezx sets, we will use in the 
next section sets of sets. 
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(a) An ICE sample (b) Intervals (int) (c) Octagons (oct) 


Fig. 1. An ICE sample and its separators using different abstract domains. 


4 Generating Attributes from Sample Separators 


We define in this section algorithms for generating a set of attributes that can 
be used for constructing decision trees representing candidate invariants. Given 
an ICE sample, these algorithms are based on constructing separators of the two 
sets of positive and negative states that are consistent with the implications in 
the sample. These separators are sets of intervals, octagons or polyhedra. The 
set of all constraints that define these sets are collected as a set of attributes. 


4.1 Abstract Sample Separators 


Let S = (S+, S7, S7) be an ICE sample, and let Ax = (Dx, =C, U, Mn, L, T} be an 
abstract domain. Intuitively, a separator has sets containing all positive states, 
not containing any negative state and is consistent with implications. Formally, 
an Ax-separator of S is a set S € 2?* such that Vp € St. Jd € S. p € d and 
Vp E€ ST. Vd € S. p g dand Yp —> q E S7. Yd ES. (ped (ad' €S. qed’). 

Given a set of positive states St, we define the basic separator Spasie as 
{{p} | p € St} where each state is alone in its set. Our method for generat- 
ing attributes for the learning process is based on computing a special type of 
separators called join-mazimal. An Ax-separator S is join-maximal if is not pos- 
sible to take the join of two of its elements without including a negative state: 
Vd, də ES. dy # dy = (An ES-.ne dy U də). 


Example 2. Let us consider again the ICE sample S given in Example 1. Figure 1 
shows the borders of join-maximal Ax-separators for S for different abstract 
domains (Intervals int, Octagons oct, and Polyhedra poly). 


Remark 1. An ICE sample may have multiple join-maximal separators as Fig. 2 
shows for the polyhedra domain. The method presented in the next section 
computes one of them non-deterministically. 


4.2 Computing a Join-Maximal Abstract Separator 


We present in this section a basic algorithm for computing a join-maximal Ax- 
separator for a given sample S. Computing such a separator can be done iter- 
atively starting from Syasic, and at each step, choosing two elements dı and d2 
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Fig. 2. Different join-maximal separators for a same sample. 


in the current separator such that dı U də does not contain a negative state in 
S- (This can be checked using the meet operation M), and replacing dı and d2 
by dı U d2. Then, if any element of the separator contains the source p of an 
implication p — q, which means that p is considered now as a positive state, 
then since q must also be considered as positive, the element {q} must be added 
to the separator if q is not already in some element of the current separator. 
When no new join operations (without including negative states) can be done, 
the obtained set is necessarily a join-maximal Ax-separator of S5. This procedure 
corresponds to Algorithm 4. 


Input : An ICE sample S = (St, S7, S7) and an abstract domain 
Ax = (De Beng Ls}. 
Output: S a join-maximal Ax-separator of S. 
1 Proc CONSTRUCTSEPARATOR(S, Ax ) 
S — Ssasic (* = {{s} ae Ys 
while true do 
if Ja,b E€ S.a £b AVYn € S~.n€¢auib then 
S — (S \ {a,b}) U {au b} ; 
while Jp —> q E€ S7. Jd E S.p E€ d^AYd' €8.q¢d' do 
| Se Su {{4}} ; 


else break; 


o Naane wn 


Algorithm 4: Computing a join-maximal Ax-separator. 


Notice that instead of starting with the basic separator Spasie defined as above 
one can start with any separator Sinit 2 Sbasic Whose additional sets contain only 
states which are known to be positive (for example the initial states). 


Example 3. Consider again the sample S of Example 2. We show how the sepa- 
rators of S in Fig. 1 are constructed using Algorithm 4. The algorithm starts from 
the basic separator Spasic where every positive state in S is alone (Fig. 3(a)). It 
picks two elements in that separator, e.g. {dı} and {d2}. As their join does not 
include negative states, {dı } and {d2} are replaced by jı = {d1} U {do} to get a 
new separator (Fig. 3(b)). Then, depending on the considered domain, different 
separators are obtained. For Intervals, the join of jı and {d3} leads to the sep- 
arator in Fig. 1(a). Notice that both ends of the implication (2,2) — (2,3) are 
included in jı U {d3}. In the case of Octagons, the join of jı and {d3} is the set 
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Fig. 3. The first iterations of Algorithm 4 on the sample S of Fig. 1 


on the left of Fig. 1(b). Again, both ends of the implication (2,2) — (2,3) are 
included in jı {d3}. In the case of Polyhedra, j2 = ji L{d3} is the triangle shown 
in Fig. 3(c). Since (2,2) is included in jg but not (2,3), the element {(2,3)} is 
added to the separator, leading to the separator represented in Fig. 3(c). In the 
next iteration, j2 is joined with {dg} leading to the separator shown in Fig. 3(d). 
Finally, a similar iteration of join operations leads to the rectangle including the 
four points, and this leads to the join-maximal separator of Fig. 1. 


Remark 2. In the best case Algorithm 4 performs |S*| join and |S*|(|S~|+|S~]) 
meet operations (all pairs of points can be joined and all left end-points of impli- 
cations are not in the new joined convex sets). In the worst case, it performs 
O((|S*| + |S7|)?) join and O((|S*| + |S7|)?(|S~| + |S7])) meet operations 
(at most |.S'~|+|S~| meets are needed to check if two sets can be joined and 
implications might add new points to St). The cost of meet and join depends on 
the used abstract domain; it is polynomial for intervals and octagons, and expo- 
nential for polyhedra, in the number of variables. Algorithm 4 is not designed 
to compute a join-max separator with a minimal number of convex sets as this 
would require a potentially exponential number of meet and join operations. 


4.3 Integrating Separator Computation in ICE-DT 


We use the computation of a join-maximal separator to provide an instance of the 
function GENERATEATTRIBUTES of ICE-DT in Algorithm 2. Given a sample S, 
let S be the Ay-separator of S computed by CONSTRUCTSEPARATOR(S, Ax) 
defined by Algorithm 4. We consider the set InitialAttributes containing all 
the predicates that constitute the specification (Init and Good) and those 
that appear in the programs (as tests in the conditional statements and 
while loops). Then, we define: GENERATEATTRIBUTES(S) = InitialAttributes U 


Uses Constr (d) 


Remark 3. Several convex sets of the separator S might generate the same con- 
straint and the set of attributes generated in this way might contain attributes 
which partition the state space in the same way (e.g. x < 0 and x > 1, equivalent 
to x > 0 over the integers). We keep only one of them. The number of attributes 
generated is at most linear in the number of positive states in the sample S. 
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int j, k, t; 

assume(j = 2 A^ k = 0); 

while true do 

| if t = 0 then j + j + 4 else j | j +2;k 4} k+1; 
assert(k = 0 V j = 2k + 2); 


a A UNB 


Fig. 4. Example program 


Notice that our function GENERATEATTRIBUTES(S), contrary to the one 
used in the original ICE-DT (Algorithm 2), does not expand a set of existing 
attributes, and therefore it only need the sample S as argument. In fact, with 
our method for computing attributes, the ICE-DT schema can be simplified: 
the while loop in Algorithm 2 can be replaced by one single initial test on the 
condition of success. Indeed, each time the learner is called, it checks whether 
the set of attributes computed for the previous sample is sufficient to build a 
separator for the new sample. Only when it is not sufficient that the generation 
of a separator is performed. Then, the call of the SUFFICIENT function afterward 
is needed to extend the sample so that the construction of a decision tree can 
be done (see explanation in Sect. 2.3), but it will necessarily succeed since in our 
case the set of attributes defines by construction a separator of the sample. 


Example 4. Consider the program in Fig. 4 whose set of variables is X = {j, k, t}. 
We use Polyhedra. First, starting from an empty ICE-sample, regardless of the 
attributes, the learner proposes true as an invariant and (5,1,0) is returned as 
a negative counterexample. Then, it proposes false and (2,0,0) is returned as 
a positive counterexample. 

Now, Algorithm 4 is called to compute a separator for S = (S+ = {(2,0,0)}, 
ST = {(5,1,0)}, S7? = Ø). Here, we use initially a separator Sinit containing 
the set of states satisfying the initial condition j = 2 A k = 0 denoted by dı in 
addition to do where dọ = {(2,0,0)}. Since do C dı, the algorithm returns the 
join-maximal separator S = {d1} with Constr?*4(d,) = {j = 2, k = 0}. 

Using constraints from S as attributes, the learner constructs the candidate 
invariant k = 0. Then, the teacher provides an implication counterexample 
(0,0,1) — (2,1,1). Now, without computing another separator (as the one it 
has is sufficient for the new sample), the learner proposes j = 2A k = 0 as 
an invariant, and the implication counterexample (2,0, 1) — (4,1,1) is returned 
(and since (2,0, 1) is an initial state, (4,1,1) is also considered positive). 

Then, Algorithm 4 is called again to construct a separator for the sample S = 
(St = {(2,0,0), (4,1, 1)}, S~ = {(5,1,0)}, S7 = {(0,0,1) — (2,1, 1), (2,0,1) — 
(4,1,1)}). Starting from a separator Sinit = {do, d1, d2} with dz = {(4,1,1)} it 
returns the join-maximal separator 


S= {d3}  Constr?°!Y(d3) = {2k +2 =j j < 4,j > 2} 


Based on this separator, the learner proposes 2k+2 = j, (2,0,0) — (6,0,0) is 
given as a counterexample (and then, since (2,0, 0) is in S+, (6,0, 0) is considered 
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positive). Then, from Sini = {do, di, d2, d4} with d4 = {(6,0,0)} a new separator 
S is constructed 


S= {d5} | Constr?! (ds) = {j + 2k <6,k >0,j > 2k +2} 


leading to a new candidate invariant: 7 + 2k <6A j > 2k +2. The teacher 
returns at this point the negative state (0,—2,0). The attributes of S are still 
sufficient to construct a decision tree for the sample. Then, the learner proposes 
j+2k <6AkKk>0A39 > 2k-+ 2, and the teacher returns the counterexample 
(3,0,1) — (5,1,1) (and since (5,1,1) is a negative state, (3,0,1) is considered 
negative). The current sample S$ is now (S+ = {(2,0,0), (4,1, 1), (6,0,0)}, S~ = 
{(5, 1,0), (5,1, 1), (8,0, 1), (0, -2,0)}, S7? = {(0,0,1) — (2,1,1), (2,0,1) - 
(4,1,1), (2,0,0) — (6,0,0), (3,0, 1) > (5,1, 1)}). 
Then, from Sini: = {do, d1, d2, d4}, a join-maximal separator is constructed 


S= {d3,d4}  Constr?””™ (d4) = {j = 6,t = 0, k = 0} 


Some iterations later, using only the attributes of the last S, the learner generates 
the inductive invariant (t =O A2<j Ak=0)v (t#0A2<j^A2k+2= j) 


4.4 Computing Separators Incrementally 


Algorithm 4 of Sect. 4.2 always starts from the initial separator, regardless of 
what has been done in the previous iterations of the ICE learning process. Here, 
we present an incremental approach to exploit the fact that adding a counterex- 
ample to the sample may modify the separator only locally allowing parts of 
separators computed in previous iterations to be reused. The basic idea is to 
store the history of the separator computation along the ICE iterations, and 
update it according to the new counterexamples discovered at each step. 


The Algorithm. We use an abstract stack data structure to represent the 
history of separators. Along the iterations of the ICE learning algorithm, an 
increasing sequence of samples S;’s is considered (at each iteration it is enriched 
by the new counterexample provided by the teacher). Then, at each step i, a 
join-maximal separator S; of the sample S; is computed and stored in the stack. 
Notice that at a given step i, separators of index j < i are not necessarily 
separators of S; since they may not cover all positive points of S;. Therefore, 
we introduce the following notion: a partial Ax-separator of a sample S is a set 
S € 2P* such that Vp € S~.Vd E€ S.p ¢ d. 

Now, to compute the separator S;, we start from one of the partial separators 
in the stack, the most recent one that is not affected by the last update of the 
sample. When the sample at step i is extended with positive states, S; can 
be computed directly from S;_,. However, when the sample is extended with 
negative states, this might require reconsidering several previous steps since some 
of the elements (convex sets) of their separators might contain states that are 
(discovered now to be) negative. In that case, we must return to the step of 
the greatest index j < i (i.e., the last step before i) such that S; is a partial 
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separator of S; (i.e., the new knowledge about the negative states does not 
affect the computed separation at step j). By the fact that the sequence of 
samples is increasing, it is indeed correct to consider the biggest j < i satisfying 
the property above. Therefore, the separator S; is computed starting from S; 
augmented with all the positive states in S$ \ S}. 

This leads to Algorithm 5. We use in its description a stack P supplied with 
the usual operations: P.head() returns the top element of the stack, P.pop() 
removes and returns the top element of the stack, and P.push(e) inserts an 
element e at the top position of the stack. A refined version of Algorithm 5 
is presented in the full paper [2] where the backtracking phase is made more 
effective: We attach information to each join-created object in order to track its 
join-predecessors (objects involved in its creation) in the stack. 


Global : P = {0} a stack of partial separators. 
1 Proc CONSTRUCTSEPARATORING (S; = (Sj, 97 , S7), Ax) 


iMi 


// backtracking 
2 while true do 
3 if dn € S; . Id € P.head(). n € d then 
4 | P.pop(); 
5 else break; 

// expansion 
6 S — P.head(); 
7 add — {p € SY | Yd € S.p ¢ d} U {q | Ip > q E€ S7 -Id € S.p E€ d' AYd” €S.q¢ d"}; 
8 while Js € add do 
9 add + add \ {s}; 
10 if Jd € S.Yn € S; .n ¢ dU {s} then 
11 let o = d U {s}; 
12 S — (S \ {4}) U fo}; 
13 for p—> q E€ S7 s.t. pE 0 ^VYd' E€S.q ¢ d do 
14 | add — addU {q} 
15 else 
16 S SU {{s}}; 
17 for p— q E€ S7 s.t.p=sAVd' €S8.q¢d' do 
18 | add — addU {q} 
19 P.push(S); 
20 return S; 


Algorithm 5: Incremental computation of an Ax-separator of a sample S. 


Integration to ICE-DT. The function CONSTRUCTSEPARATORIĪNC can be 
integrated to the ICE-DT algorithm just as the function CONSTRUCTSEPARATOR 
in Sect. 4.3, by using it to implement the function generateAttributes of the 
learner. But this time, the learner is more efficient in computing the separator 
from which the attributes are extracted. 


Example 5. Consider again the program in Fig. 4 of Example 4. The two first 
iterations are similar to the ones described in Example 4. Then, the obtained 
sample is S = (St = {(2,0,0)},S~ = {(5,1,0)}, S7? = 0). Starting from 
the empty separator, Algorithm 5 computes the separator Sı = {dı} where 
Constr?’ (d1) = {j = 2, k = 0}. Then, the learner proceeds as in the previous 
example to get the sample S = (S+ = {(2,0,0), (4,1,1)}, S+ = {(5,1,0)}, S7 = 
{(0,0,1) — (2,1,1), (2,0,1) — (4,1, 1)}). To build a separator of S, Algorithm 5 
starts from Sı and produces S2 = {da} where d3 = dı U {(4,1,1)}. 
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Tool Solved Total 
safe| unsafe 


ICE-DT 111) 11 122 


LoopInvGen||130) 8 138 NIS(int) |NIS(oct) || NIS(poly) 
cvca |j129| - || 129 NIS(int) E 7 2 
Spacer 118| 18 136 NIS(oct) 13 - 5 

NiS(int) ]]106) 17 |] 123 NIS(poly) 31 18 A 


NIS(oct) ||122| 14 || 136 
NIS(poly) ||137| 17 || 154 


NIS(VB) |[143] 17 || 160 


Fig. 5. Benchmark results and comparison of NIS wrt. different abstract domains. 


Similarly, when the counterexample (2,0,0) — (6,0,0) is obtained, the algo- 
rithm starts directly from Sz to produce S3 = {d5} where ds = d3 L {(6,0,0)}. 

After two more iterations, the sample is the same as S’ in Example 4. At 
this point, S cannot be used to construct a separator for S since ds includes 
the negative state (3,0,1). Then, the algorithm removes S3 from the stack. It 
checks that Sə is a partial separator of S, which is indeed the case. Then, it 
constructs a new separator S4 based on S2 by expanding it with the counterex- 
amples received after the construction of Sz (the negative state (0,—2,0) and 
the implications (2,0,0) — (6,0,0) and (3,0,1) — (5,1, 1)): Sa = {d3, dg} where 
Constr? (dg) = {t = 0,k = 0,7 = 6}. The rest of the execution proceeds as 
with Algorithm 4. Here, the advantages of the incremental method are: (1) while 
positive examples are added the separators are simply expanded, and (2) when 
a negative example at step 4 is added, only one join operation has to be undone. 


5 Experiments 


We have implemented the prototype tool NIS (Numerical Invariant Synthesizer) 
using our method for attribute synthesis with the ICE-DT schema. NIS written 
in C++ is configurable with an abstract domain for the manipulation of abstract 
objects. It uses Z3 [24] for SMT queries and APRON’s [18] abstract domains. 

We compare our implementation with ICE-DT!, LoopInvGen, CVC4, and 
Spacer?. LoopInvGen is a data-driven invariant inference tool based on a syn- 
tactic enumeration of candidate predicates [25,26]. It is written in OCaml and 
uses Z3 as an SMT solver. CVC4 uses an enumerative refutation-based app- 
roach [1,27]. It is written in C++ and it includes an SMT solver. Spacer is a 
PDR-based CHC solver [19], written in C++ and integrated in Z3. 


1 The original ICE-DT tool [16] does not support programs in the SyGuS format. Here 
we use our own implementation of ICE-DT. It shares with NIS all the components 
(teacher, decision tree learning algorithm with implications) except that attribute 
discovery is enumerative. 

2 Spacer does not support programs in the SyGuS format; a wrapper is written in 
C++ that converts a SyGuS program to a CHC problem and supplies it to Spacer 
via the Z3 FixedPoint API. 
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The evaluation was done on 164 linear integer arithmetic (LIA) programs? 


from SyGuS-Comp’19. They have a number of variables ranging from 2 to 10. 
The experiments were carried out using a timeout of 1800s (30min) for each 
example. They were conducted on a machine with 4 CPUs Intel(R) Xeon(R) 
2,13 GHz, 16 cores, and 128 Go RAM running Linux CentOS 7.9. 

Figure 5 shows the number of safe and unsafe solved programs by each tool. 
The instance of our approach using the Polyhedral abstract domain solves 154 
programs out of 164, and the virtual best of our approach with the three abstract 
domains Intervals, Octagons, and Polyhedra, solves 160 programs out of 164. 
Two of the remaining examples require handling quantifiers, which cannot be 
done with the current implementation. The two others have not been solved with 
any of the four tools we considered. 

These results show that globally our approach is powerful and is able to solve a 
significant number of cases that are not solvable by other tools. Interestingly, using 
different abstract domains leads to incomparable performances: although with 
polyhedra more cases are solvable, there are some cases that are uniquely solv- 
able with intervals or octagons. Also, while operations on intervals and octagons 
have a lower complexity than on polyhedra, this is compensated with the fact that 
polyhedra are more expressive. Indeed, their expressiveness allows in many cases 
to find quickly invariants for which a less expressive domain requires much more 
iterations to be learned. Figure 5 shows the number of programs that can be solved 
using a particular abstract domain but not with another. Polyhedra are globally 
superior, but the three domains are complementary. 

Compared to the other tools, the bottleneck of ICE-DT and also of Loop- 
InvGen is the number of predicates that are generated using enumeration. Our 
approach avoids the explosion of the size of the attribute pool by guiding their 
discovery with the data sample, and reducing the size (by replacing objects by 
their join) of the computed separators from which constraints are extracted. Con- 
cerning CVC4, it uses enumerative refutation techniques, which are also subject 
to an explosion problem. Moreover, CVC4 does not allow to solve the cases of 
unsafe programs. The performances of Spacer depend on the ability to general- 
ize the set of predecessors computed using the model-based projection and the 
interpolants used for separation from bad states in the context of IC3/PDR. 
While this is done efficiently in general, there are cases where this process can 
lead to fastidious computations while our technique can be much faster using a 
small number of join operations of positive states. 

The scatter plots shown in Fig.6 compare the execution times of our app- 
roach using Polyhedra abstract domain NIS(poly) with LoopInvGen, CVC4 
and Spacer. (A timeout of 1800s s is used for each example.) They show that 
NIS(poly) is in general faster than both LoopInvGen and CVC4, and that it 
has comparable performances in terms of execution time with Spacer. We have 


3 Other programs from SyGuS-Comp’19 have not been taken into account in our 
evaluations as they are boolean programs with integer variables for encoding nonde- 
terminism or artificial programs augmented with useless variables and statements. 
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Fig. 6. Runtime of NIS(poly) vs. LoopInvGen, CVC4, and Spacer, and NIS(oct) vs. 
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ICE-DT. 


also compared the original ICE-DT, based on enumerative attribute generation 
using octagonal templates (as in [16]) with NIS(oct). The comparison shows 
that our tool is significantly faster (see the bottom right subfigure of Fig. 6). 


6 Conclusion 


We have defined an efficient method for generating relevant predicates for the 
learning process of numerical invariants. The approach is guided by the data 
sample built during the process and is based on constructing a separator of the 
sample. The construction consists of an iterative application of join operations 
in numerical abstract domains in order to cover positive states without including 
negative ones. Our method is tightly integrated to the ICE-DT schema, leading 


TO 


to an efficient data-driven invariant synthesis and verification algorithm. 


Future work includes several directions. First, alternative methods for con- 
structing separators should be investigated in order to reduce the size of the pool 
of attributes along the learning process while increasing their potential relevance. 
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Another issue to investigate is the control of the counterexamples provided by 
the teacher since they play an important role in the learning process. In our 
current implementation, their choice is totally dependent on the SMT solver 
used for implementing the teacher. Finally, we intend to extend this approach to 
other types of programs, in particular to programs with other data types, and 
programs with more general control structures such as procedural programs. 
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Abstract. Bounded Model Checking (BMC) is a popularly used strat- 
egy for program verification and it has been explored extensively over 
the past decade. Despite such a long history, BMC still faces scalability 
challenges as programs continue to grow larger and more complex. One 
approach that has proven to be effective in verifying large programs 
is called Counterexample Guided Abstraction Refinement (CEGAR). 
In this work, we propose a complementary approach to CEGAR for 
bounded model checking of sequential programs: in contrast to CEGAR, 
our algorithm gradually widens underapproximations of a program, 
guided by the proofs of unsatisfiability. We implemented our ideas in 
a tool called LEGION. We compare the performance of LEGION against 
that of CORRAL, a state-of-the-art verifier from Microsoft, that utilizes 
the CEGAR strategy. We conduct our experiments on 727 Windows and 
Linux device driver benchmarks. We find that LEGION is able to solve 
12% more instances than CORRAL and that LEGION exhibits a comple- 
mentary behavior to that of CORRAL. Motivated by this, we also build 
a portfolio verifier, LEGION’, that attempts to draw the best of LEGION 
and CORRAL. Our portfolio, LEGION” , solves 15% more benchmarks than 
CORRAL with similar computational resource constraints (i.e. each ver- 
ifier in the portfolio is run with a time budget that is half of the time 
budget of CORRAL). Moreover, it is found to be 2.9x faster than CORRAL 
on benchmarks that are solved by both CORRAL and LEGION”. 


Keywords: Verification - Bounded model checking - 
Underapproximation widening 


1 Introduction 


Bounded Model Checking (BMC) [11, 20,26, 33] is a popular option for program 
verification, primarily due to its ability of side-stepping the necessity of synthe- 
sizing complex invariants. BMC harnesses the power of modern SMT solvers to 
verify a bounded set of behaviors of a program. The user, if interested, may 


© The Author(s) 2022 
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re-attempt verification with larger bounds once the program is proven correct 
with small bounds. 

BMC operates by constructing a logical formula that symbolically captures 
all states reachable by a program under a user-provided bound. A query, referred 
to as the verification condition (VC), is constructed as a conjunction of the 
program semantics and the negation of the property, which is also expressed as 
a logical formula. If the verification condition is satisfiable, it implies that some 
program execution violated the property of interest, thus the program is faulty. 
If unsatisfiable, the program satisfies the property, i.e. the program is safe under 
the chosen bound. 

However, for large programs, BMC faces scalability challenges as the ver- 
ification condition for the program tends to grow large, posing difficulties for 
the SMT solver. Prior work has answered this challenge by using the popular 
counterecample-guided abstraction refinement (CEGAR) strategy: start off with 
the VC for an abstraction of the program, and incrementally refine the abstrac- 
tion until the program is decided as safe or faulty. The Stratified Inlining (SI) [26] 
algorithm is an instance of this strategy. SI starts off with an abstraction of 
only the entry procedure of the program, and then incrementally inlines callees, 
guided by counterexamples. Not surprisingly, the dynamic inlining strategy of 
SI has been found to be significantly more scalable than algorithms that stat- 
ically inline all procedures [25]. The SI algorithm is used in practice by the 
CORRAL [24] verifier that powers Microsoft’s Static Driver Verifier (SDV) [4]. 

In this work, we propose a new algorithm that uses proofs of unsatisfiabil- 
ity to widen underapproximate models of the program en route to verification 
of sequential programs. Our algorithm starts off by constructing a partial ver- 
ification condition for only the program entry procedure and blocks all paths 
that invoke calls to procedures that have not yet been inlined. This constructs 
an underapproximation of the original program (because paths are blocked). A 
satisfiable result on an underapproximation will indicate the presence of a bug. 
If the VC is unsatisfiable, we examine its proof of unsatisfiability in order to 
guide the inlining of called procedures. The program can be declared safe when 
the proof of unsatisfiability does not depend on any procedure call that has not 
been inlined yet. We implemented our ideas in a tool called LEGION. 

Further, we found that our underapproximation widening algorithm and the 
abstraction refinement strategy (used by CORRAL) demonstrate complementary 
behaviors—many programs that CORRAL struggles on, yield to the underapprox- 
imation based technique, and vice-versa. This observation motivated us to build 
a portfolio verifier, LEGION”, that runs both these techniques in parallel. We 
found that the portfolio is more effective than any of the tools alone (with simi- 
lar computational resources, i.e. each verifier in the portfolio is run with a time 
budget that is half of the time budget of CORRAL). Both LEGION and LEGIONY 
are available open-source at the legion branch of the corral repository?. 

Our experiments are conducted on 727 Windows and Linux device driver 
benchmarks on which CORRAL struggles, i.e., CORRAL is unable to solve any of 


1 https: //github.com/boogie-org/corral.git (branch: legion). 
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these benchmarks in less than 200s. We find that LEGION is able to solve 12% 
more instances than CORRAL with a time budget of 2 h per instance. Further, 
the portfolio verifier, LEGION’, given half the time budget of CORRAL, solves 
15% more benchmarks than CORRAL, and it is found to be 2.9x faster than 
CORRAL on benchmarks that are solved by both CORRAL and LEGION”. 

The primary contributions of this paper are as follows: 


— We design a new algorithm, Underapprozimation Widening guided Stratified 
Inlining, that uses proof-based artifacts to widen underapproximate models 
(in contrast to using counterexamples to refine overapproximate models). 

— We implemented our ideas in a tool called LEGION for bounded program 
verification. 

— We also design a portfolio verifier, LEGION’, that includes both overapprox- 
imation refinement and underapproximation widening to verify a program in 
an attempt to reap the benefits of both worlds. 

— We evaluate both LEGION and LEGION™ on a set of 727 programs from Win- 
dows Device Drivers [31] from the SDV test-suite and Linux Device Drivers 
from SVCOMP [7] benchmarks. 


2 Background 


This section presents background material that we use in the rest of the paper. 

A logical formula consists of literals. A literal is either a variable or the 
negation of a variable. A logical formula expressed in a Conjunctive Normal 
Formal (CNF) is a conjunction of clauses where each clause is a disjunction 
of literals. Given a logical formula, a satisfiability solver returns whether the 
formula is satisfiable (SAT) or unsatisfiable (UNSAT). If a formula is SAT, the 
solver provides a model in the form of a satisfying assignment of the variables. If 
a formula is UNSAT, the solver returns an unsatisfiable core (unsat core), which 
is a subset of clauses of the input formula whose conjunction is still UNSAT. 


2.1 Language Model 


We consider a programming language that represents a passified form of BOo- 
GIE programs [8]. A program consists of multiple procedures (Proc). We assume 
an entry-point procedure called main where program execution starts. Each pro- 
cedure can have any number of local variable declarations followed by a series 
of basic blocks (BasicBlock). We assume that local variables are initially uncon- 
strained. A basic block is labeled by a unique identifier and consists of mul- 
tiple statements (Stmt) followed by a single control statement (ControlStmt). 
A control statement is either a goto, which takes a sequence of basic block 
labels and non-deterministically picks one to jump to, or a return that returns 
control back to the caller. Returning from main terminates the program exe- 
cution. A statement is either an assume command or a procedure call. The 
statement (assume p) allows a feasible execution only if ọ is satisfiable. 
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procedure fool(int x, int y) { 
procedure main() { L9: assume y = x + l; 
int x, y, z; bool c; return; } 
LO: assume x == 0; 
assume y == 0; procedure foo2(int x, int y) { 
goto L1, L2; L10: assume y == x — l; 
Ll: assume c; return; } 
call foo(x,y); 
goto L3; procedure bar(int x, int y) { 
L2: assume !c; bool e; 
call bar(x,y); L11: goto L12, L13; 
goto L3; L12: assume e; 
L3: assume y != 0 call barl(x, y); 
return; } goto L14; 
L13: assume !e; 
procedure foo(int x, int y) { call bar2(x, y); 
bool d; goto L14; 
L5: goto L6, L7; L14: return; } 
L6: assume d; 
call fool(x, y); procedure barl(int x, int y) { 
goto L8; L15: assume y == x + 10; 
L7: assume !d; return; } 
call foo2(x, y); 
goto L8; procedure bar2(int x, int y) { 
L8: return; } L16: assume y == x — 10; 
return; } 


Fig. 1. A passified program 


We leave the set of variable types (Type) and expressions (Expr) unspecified. 
In practice, we can use any expression language that can be directly encoded 
in SMT. Our implementation uses linear arithmetic, fixed-size bit-vectors, unin- 
terpreted functions, and extensional arrays. This combination is sufficient to 
realistically translate C programs into our language representation [21,24]. 

Note that the programs that we consider do not have global variables, return 
parameters of procedures, or assignments. These restrictions are without loss of 
generality [23]. Conversion of these additional feature into our language repre- 
sentation is readily available in tools like BOOGIE. A passified program makes it 
easy to describe the verification-condition generation process. 

Given a program P, we consider the verification question of whether there 
exists a terminating execution of P. To be precise, we are interested in finding 
out whether there is any execution of main that reaches its return statement. 
If no such execution exists, then P is considered verified, or SAFE. Otherwise, 
we say that P is UNSAFE and return the execution trace with concrete variable 
values along the trace. Note that we consider a bounded version of the verification 
problem, i.e., we require that P does not contain any loops or recursive procedure 
calls. All such loops and recursive calls must be unrolled to a pre-determined 
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Fig. 2. Call graph of the program in Fig. 1. 
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Fig. 3. Partial VC of main() 


depth before proceeding with verification, and thus, the verification problem now 
becomes decidable (if the expression language of the program is decidable) [23]. 


2.2 VC Generation for a Procedure 


Consider a procedure baz that does not contain any procedure calls. This section 
outlines one way of verifying baz, i.e., finding out if it has a terminating exe- 
cution. We use a process called Verification Condition (VC) generation on baz 
to construct a logical formula & and feed it to an SMT solver. If is UNSAT, 
then the return statement in baz is unreachable and baz is SAFE. Otherwise, we 
extract the satisfiable model from the SMT solver, construct the execution trace 
and return UNSAFE along with the trace. We now outline the VC-generation 
process. 

Suppose that baz takes input arguments Z. For each basic block j in baz, 
we define a boolean variable blk; that is termed as the control-flow variable. 
Let st; denote the conjunction of all assume statements in basic block j. Let 
successor(j) denote the targets of the goto statement in j, i.e., all the successor 
basic blocks in baz, to which control may jump non-deterministically from j. 
Let i; be a unique integer constant representing basic block j. We also define an 
uninterpreted function flow : Z — Z that records the non-deterministic choice 
of the successor basic block of j. Given the above, we construct a logical formula 
pj for each basic block j as follows: 
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blk; = (st; \f (blk A (is == flow(i;)))) 


s€successor(j) 


If basic block j ends with a return statement instead of a goto, then 7; is: 
blk; => st; 


Assuming the first basic block of baz, where procedure execution begin, is 
labeled s, the VC of baz is constructed as follows: 


blk, A VAN wy 


lebasicblocks(p) 


In Fig. 3, we show the VC of main of the program in Fig. 1 as an example, 
where we ignore the procedure calls in main (i.e., treat them as (assume true)). 
We term such a VC (of a procedure where its calls are skipped) as the partial 
VC (pVC) of the procedure. 


2.3 Static Versus Dynamic Inlining 


Given a program P with a starting procedure main, one simple way to verify P 
would be to construct the VC of main by inlining all the procedure calls and 
check the satisfiability of VC(main) with an SMT solver. However, employing 
such a static inlining strategy can cause an exponential blowup in the size of the 
VC. Hence, we instead make use of dynamic inlining algorithm, called Stratified 
Inlining (SI) [26], that employs a Counterexample Guided Abstraction Refine- 
ment (CEGAR) technique [14] to dynamically inline procedure VCs. It has been 
shown that dynamic inlining scales better than static inlining [25]. Dynamic 
inlining produces more compact VCs during abstraction refinement which leads 
to significantly faster program verification. 


2.4 Verification with Stratified Inlining 


The working of SI is shown in Algorithm 1. For the sake of simplicity, let us 
assume that each basic block in P may contain only a single procedure call. 
Every program point, from which a procedure is called, is termed as a callsite. 
For example, main in Fig. 1, has two callsites; foo and bar which are called from 
basic blocks L1, L2 and L3 respectively. A static instance of a callsite is denoted 
with a pair (l, c) where / denotes the basic block identifier from which a call 
to the procedure c is made. A dynamic callsite is defined as a stack of static 
callsites which represents the runtime stack during a program’s execution with 
main being present at the bottom of the stack. For example, the dynamic callsite 
corresponding to the call foo from L1 in main is given by |main, (L1, foo)]. The 
call graph of the program in Fig. 1 is shown in Fig. 2. 
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Algorithm 1: Stratified Inlining (SI) algorithm. 


Input: program P with starting procedure main 

Input: An SMT solver S 

Output: SAFE, or UNSAFE(T) 

C < {[main, s] | s € callsites(main)} 

S.Assert(pVC (main, [main])) 

while true do 

outcome — OVERREFSTEP(P, C, S) 

if outcome == SAFE V outcome == UNSAFE(T) then 
return outcome 


else 
let NODECISION(_, C’) = outcome 
C C 


ooN DOAK WN RB 


The SI algorithm takes as input a program P with a starting procedure main 
and an SMT solver S. Initially, we add the dynamic callsites in main to a list C 
(Line 1) and then inline main, i.e., assert the pVC of main (Line 2). The callsites 
in C are termed as open callsites because they have not yet been inlined. The 
above steps construct an abstraction of P. The SI algorithm then iteratively 
calls the OVERREFSTEP routine on this abstraction (Line 4) to perform gradual 
refinement until we can reach a decision about whether P is SAFE or not. Each 
invocation of OVERREFSTEP can potentially inline more procedures by asserting 
their partial VC to the solver S. Thus, the state of the solver, as well as the set 
of open callsites C change across invocations of OVERREFSTEP. We discuss the 
Overapprozimation Refinement Guided Stratified Inlining (OverRefSI) strategy 
used by the OVERREFSTEP routine in Sect. 2.5. 


2.5 Overapproximation Refinement Guided Stratified Inlining 


The OVERREFSTEP routine given in Algorithm 2 demonstrates the inner work- 
ings of the OverRefSI strategy at each verification step. The OverRefSI strat- 
egy [26] for verifying a program works by iteratively firing overapproximation 
queries and gradually refining the abstraction of P. If the query returns UNSAT, 
then we can conclude that P is SAFE with respect to the given property. Other- 
wise, we extract all the open callsites that appear on the counterexample trace 
and refine the abstraction of P by inlining these callsites. If the counterexample 
trace contains no open callsites, then P is UNSAFE and we return the verdict 
along with the counterexample trace. 

The OVERREFSTEP routine takes as input a program P, a set of open call- 
sites C and an SMT solver S. The OVERREFSTEP routine is called iteratively 
in order to verify the safety of P. We demonstrate the working of OverRefSI to 
verify the pVC of main of Fig. 1 in Table 1. At the beginning, the SI algorithm 
asserts the pVC of main to S and adds |main, (L1, foo)] and [main, (L2, bar)] to 
the list of open callsites C in step 0. 
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Algorithm 2: OVERREFSTEP(P, C, S) 


Input: procedure P, set of callsites C, SMT solver S 
Output: SAFE, UNSAFE(trace), NODECISION(T, C) 
// Overapproximate check 

if S.Check() == UNSAT then 

return SAFE 


else 

| 7 < opencallsites(S.Model()) 
if r == then 

| return UNSAFE(S.Model()) 
else 

Cc’ <9 

10 forall c € 7 do 

11 | C’ — INLINE(P, c) 

12 C-(C-Tr)UC’ 

13 return NODECISION(r, C) 


a Pp ON BR 


om IND 


Next, the SI algorithm calls OVERREFSTEP with P, C and S as arguments. 
OVERREFSTEP fires an overapproximation query in Line 2. If the query is unsat- 
isfiable, we return the SAFE verdict. If the query is satisfiable, we get the coun- 
terexample trace and extract all the open callsites on the trace in 7 (Line 5). 
If 7 is empty, i.e., the counterexample trace contains no open callsites, then 
the trace is not spurious and we can return an UNSAFE verdict with the trace 
(Line 7). Otherwise, we inline all the callsites in 7 and add all the new callsites 
that opened up due to the inlinings in C” (Line 11). Inlining a callsite c consists 
of asserting the partial VC of the procedure that was invoked from c. 

Subsequently, the inlined callsites are removed from the list of open callsites C 
and new callsites that opened up due to the inlinings are added to C (Line 12). 
For example, in step 1 of Tablel, OVERREFSTEP fires an overapproximation 
query that returns SAT with a counterexample trace that contains the callsite 
of foo, i.e., [main, (L1,foo)]. This callsite is then inlined by asserting the pVC 
of foo to the solver. This opens up the callsites of foo1 and foo2. Since we have 
not been able to arrive at a decision regarding the safety of P at this step, a 
verdict of NODECISION is returned along with the list of inlined callsites 7 and 
the new list of open callsites C (Line 13). 

Next, the SI algorithm calls OVERREFSTEP again and in step 2, it fires an 
overapproximation query again, which returns SAT with the counterexample 
trace containing the open callsite of fool that we inline by asserting the pVC 
of fool. The verification process continues in this way by inlining the open 
callsites on the counterexample trace in every step, which gradually refines the 
pVC of main. Finally, in step 7, the overapproximation query returns UNSAT 
from which we can conclude that main is SAFE. 
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Table 1. Execution of OverRefSI on the program of Fig. 1 


STEP Action Open callsites 
0 Assert pVC(main) [main, (L1,foo) 
[main, (L2,bar) 


i Overapprox check: SAT 
Assert pVC(foo) [main, (L2,bar) 
[main, (L1, foo), (L6, foot) 
[main, (L1,foo), (L7, fo002) 


2 Overapprox check: SAT 
Assert pVC(foo1) [main, (L2,bar) 
[main, (L1,foo), (L7, fo02) 


3 Overapprox check: SAT 
Assert pVC(£002) [main, (L2,bar) 


4 Overapprox check: SAT 
Assert pVC(bar) [main, (L2,bar), (L12, bar1) 
[main, (L2,bar), (L13, bar2) 


5 Overapprox check: SAT 
Assert pVC(bar1) [main, (L2,bar), (L13, bar2) 


6 Overapprox check: SAT 
Assert pVC(bar2) 

7 | Overapprox check: UNSAT 
Return SAFE 


3 Overview 


3.1 Underapproximation Widening 


We propose a novel algorithm, Underapproximation Widening Guided Stratified 
Inlining (Under WidenST), that uses proofs of unsatisfiability to guide stratified 
inlining. UnderWidenSI maintains an underapproximated model of the target 
program and widens it until either the program is verified as safe or a bug is 
found. 

We illustrate the UnderWidenSI strategy in Figs. 4a to 4d. Assume that we 
are trying to verify whether some required property holds on a program. The 
space contained by the yellow ovals show the reachable program states while the 
red ovals depict error states on which the required property does not hold. The 
objective of a verification algorithm is to construct a model of the program that 
is precise enough to show that the program can reach an error state or prove 
that the error states are unreachable. Figures 4a to 4c show a safe program while 
Fig. 4d depicts an unsafe program. 

Consider Fig. 4a: the UnderWidenSI algorithm starts off with the partial 
verification condition of the entry procedure and “blocks” executions though all 
its open callsites. 
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model. 


Fig. 4. How UnderWidenSI works 


Definition (Blocked callsites). We use the term, blocking a callsite C, to 
imply that all paths that reach C are deemed infeasible. That is, blocking a 
callsite has the effect of replacing the callsite by (assume false). 

Essentially, blocking callsites creates underapproximations of the set of feasi- 
ble program paths. Such underapproximated VCs can be constructed by assert- 
ing additional blocking clauses corresponding to the control-flow variables of the 
open callsites. These blocks disallow reachability to certain program states. For 
example, in Fig. 4a, we construct an underapproximated model of the program by 
blocking the open callsites C; and C2. The inner green oval depicts the program 
states that are reachable in the underapproximated model, whereas the outer 
gray regions demonstrate the states that are unreachable due to the blocks on 
Ci and Co. 

If the verification query on this model (conjunction of the underapproximated 
model and the negation of the property) returns SAT, it implies that an error 
state in indeed reachable. On the other hand, if the query returns UNSAT (as 
shown in Fig. 4a), we need to widen the model to procure additional reachable 
executions. We guide this widening operation by extracting the reason for this 
unsatisfiability from a minimal unsat core? of the query, that returns the set of 
block clauses; the callsites corresponding to these blocking clauses constitutes 


? Although there may exist multiple minimal unsat cores, we found via some prelimi- 
nary experiments that the choice of the unsat core does not have a significant impact 
on the overall runtime of our algorithm (on an average). 
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Fig. 5. How OverRefSI works 


a reason of why the current underapproximate model is not able to reach any 
of the error states. Hence, we widen the model by unblocking exactly these 
callsites leading to a wider model (see Fig. 4b). The widening by inlining C2 
causes a stratified inlining step, and hence may open up new callsites, say C3 
and C4. 

We proceed in the same manner by blocking these open callsites and repeat 
the query. Finally, (in Fig. 4c) we construct an underapproximated model that 
still does not intersect with the error states. However, in this case, the unsat core 
does not contain any blocked clause, as none of the currently blocked callsites 
would have allowed widening in the direction of the error states. 

The unsat core provides a direction for widening towards the error states. 
This also allows us to declare that the program is safe without requiring to 
widen the model to encompass the set of all reachable program states—if the 
verification query is UNSAT and the unsat core does not contain any blocked 
clause, then this forms a sufficient condition to declare the program safe. 

Figure 4d shows how our algorithm proceeds for a faulty program: it incre- 
mentally widens the model in the direction of the error states till an error state 
R is reached. At this point, the UnderWidenSI algorithm declares the program 
as unsafe. 

Let us now contrast the UnderWidenSI strategy with the OverRefSI strategy, 
popularly known as countererample-guided abstraction refinement (CEGAR), 
which currently drives the SI algorithm in CORRAL. OverRefSI starts off with 
an overapproximated model of the program: the pVC of the entry procedure 
with all callsites replaced by non-deterministic updates to its set of modi- 
fied variables. For example, in Fig.5a, OverRefSI constructs an abstract pro- 
gram/overapproximated model Mı of the program by overapproximating the 
open callsites. If the resulting verification condition is SAT, it examines the gen- 
erated counterexample to check if it spurious. If the counterexample is found to 
be a true bug, it declares the program unsafe. If the counterexample is spurious, 
the model is refined to eliminate this spurious counterexample. For example, in 
Fig. 5a, we find that there exists an error state/counterexample P within Mı, 
where the property can be violated. Hence, OverRefSI refines M; in Fig. 5a by 
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inlining the overapproximated callsites through which P is reachable. The refine- 
ment is done to rule out P as a counterexample, i.e., P becomes unreachable 
after refinement. We observe in Fig. 5a, that after the first round of refinement, 
P is no longer reachable in the overapproximated M2, however, we can still find 
another counterexample Q. Hence, the abstraction Mə is refined again. The pro- 
gram is declared safe when the model cannot reach any error state. Note that 
the algorithm can prove the safety of the program without requiring to precisely 
capture the exact set of reachable program state. 

OverRefSI and UnderWidenSI are complementary: while OverRefSI main- 
tains an overapproximated model and refines the model (shrinking the set of 
reachable states), UnderWidenSI maintains an underapproximated model and 
widens the model (expanding the set of reachable states) incrementally. In terms 
of the algorithmic details, the OverRefSI algorithm in CORRAL uses the models 
(the counterexamples) to drive refinements, whereas our UnderWidenSI algo- 
rithm uses the proof (the unsat core) to guide the widenings. 


4 Algorithms 


4.1 Underapproximation Widening Guided Stratified Inlining 
(Under WidenST) 


The UNDERWIDENSTEP routine in Algorithm 3 demonstrates how the Under- 
WidenSI strategy works in each verification step. It takes as input a procedure P, 
a set of open callsites C and an SMT solver S. The UNDERWIDENSTEP routine 
is called by the SI algorithm (instead of OVERREFSTEP in Line 4) iteratively in 
order to verify the safety of P. 

In the beginning, we construct an underapproximated pVC of the input pro- 
cedure P by blocking all calls through the open callsites in C (Line 4). Next, we 
fire an underapproximation query (Line 5). If the query returns SAT, then we 
return the verdict UNSAFE with the counterexample trace (Line 6). Otherwise, 
we get the minimal unsatisfiable core uc and extract all the blocked callsites 
which appear on uc in p (Line 8). 

If u does not contain any blocked callsites, we deduce that P is SAFE. The 
proof of the safety of P is captured by uc. Hence, we return the verdict that P 
is SAFE. Otherwise, each of the callsites in u are then inlined (Line 15) which 
constructs a refinement of P. The inlined callsites are then removed from the 
list of open callsites C and new callsites that opened up due to the inlinings are 
added to C (Line 16). 

When the algorithm is unable to arrive at a decision regarding the safety 
of P, it returns a verdict of NODECISION along with the list of inlined callsites 
u and the new list of open callsites C (Line 13). 


Example. We demonstrate the working of UnderWidenSI to verify the pVC 
of main of Fig.1 in Table2. Initially, we assert the pVC of main and add 
[{main, (L1,foo)] and [main,(L2,bar)] to the list of open callsites in step 0. 
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Algorithm 3: UNDERWIDENSTEP(P, C, S) 


Input: procedure P, set of callsites C, SMT solver S 
Output: SAFE, UNSAFE(trace), NODECISION(p, C) 


1 // Underapprozimate check 

2 S.Push() 

3 forall c € C do 

4 | S.Assert(>ControlVariable(c)) 
5 if S.Check() == SAT then 

6 return UNSAFE(S.Model()) 
7 else 

8 | p+ BlockedCallsites(S.UnsatCore()) 
9 S.Pop() 

10 if u == 9 then 

11 | return SAFE 

12 else 

13 (OR) 

14 forall c € u do 

15 | C’ — INLINE(P, c) 

16 CH-(C-p)uc’ 

17 | return NoDECISION(p, C) 


Replacing each of the open callsites with (assume false) statement, i.e., blocking 
them, constructs an underapproximation of the program. If an SMT solver query 
on this underapproximation returns SAT, then the program is surely UNSAFE as 
the satisfiable model can only represent an execution trace that goes through 
inlined callsites. In that case, we can return the verdict UNSAFE along with an 
error trace constructed from the model. On the other hand, if the underapprox- 
imation check returns UNSAT, then we cannot return a verdict on the safety of 
the program immediately. 

Following this, in step 1 (see Table2), we push a new frame on the solver 
and assert (ablk, 1, \7blkz2) to block executions through the callsites of foo and 
bar respectively to construct the underapproximated pVC of main. We query 
the solver with these constraints. Figure 1 shows that if we block executions 
through basic blocks L1 and L2, the program cannot terminate, i.e., the return 
statement in L3 is not reachable. Hence, the solver returns UNSAT. The reason 
for the unsatisfiability is blocking executions through both L1 and L2. 

To widen the underapproximated model of the program so that we may 
reach L3, we need to remove the block on at least one of them and inline the 
respective callsite. The unsat core, in this case, contains the callsite of varbar 
in basic block L2. Therefore, we pop the earlier solver frame containing blocked 
clauses and assert (blkz2 => pVC(bar)) in the solver. Inlining bar, opens up 
the callsites [main, (L2, bar), (L12, bar1)] and [main, (L2, bar), (L13, bar2)]. 

Next, in step 2, we again construct the underapproximated pVC of main 
by blocking executions through the callsites of foo, bar1 and bar2. The solver 
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Table 2. Execution of UnderWidenSI on the program of Fig. 1 


STEP Action Open callsites 
0 Assert pVC(main) [main, (L1,foo) 
[main, (L2,bar) 

1 Underapprox check: UNSAT 
Assert pVC(bar) [main, (L1,f00)], 


[main, (L2,bar), (L12, bar1) 
[main, (L2,bar), (L13, bar2) 


2 Underapprox check: UNSAT 
Assert pVC(foo) [main, (L1,foo), (L6, foot) 
Assert pVC(bar1) [main, (L1,foo), (L7, foo2) 
Assert pVC(bar2) 

3 Underapprox check: UNSAT 


Assert pVC(fo001) 
Assert pVC(£002) 

4 Underapprox check: UNSAT 
Return SAFE 


query returns UNSAT with uc containing the callsites of foo, bar1 and bar2 
which are inlined. 

In step 3, the callsites of fool and foo2 are now open. Blocking both of 
these callsites and making an underapproximation check returns UNSAT with 
uc containing the callsites of foo1 and foo2. These callsites are now inlined. 

In step 4, the underapproximation query returns UNSAT and uc contains 
no blocked callsites. This points to the fact that uc contains only inlined call- 
sites, i.e., starting from step 0 if we only inline the callsites in uc and leave the 
remaining callsites overapproximated, we will still get an UNSAT. Therefore, uc 
is the proof of the safety of the program and we return the verdict that the pVC 
of main is safe. 

Note that when the underapproximation query returns SAT, then the coun- 
terexample trace is constructed on the underapproximated program, i.e., the 
trace may contain only blocked and inlined callsites. The underapproximated 
program represents a subset of the paths in the original program, therefore, any 
counterexample trace present in the underapproximated program is sure to be 
present in the original program as well. Therefore, if the underapproximated 
program is unsafe, the original program is unsafe as well. 

We have implemented the UnderWidenSI algorithm in LEGION. We compare 
the performance of the UnderWidenSI algorithm in LEGION against that of 
CORRAL which uses OverRefSI. 


4.2 Portfolio Technique 


The complementary behavior of the OverRefSI and the UnderWidenSI algo- 
rithms motivate us to design a portfolio approach for verifying a program. The 
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portfolio strategy incorporates both the OverRefSI algorithm used by CORRAL 
and the Under WidenSI algorithm implemented in LEGION. We refer to the port- 
folio verifier as LEGION*. For each program, LEGION* runs both CORRAL and 
LEGION in parallel. LEGIONT terminates verification as soon as one of the algo- 
rithms finishes verification and reports the outcome. We discuss the performance 
of LEGION against that of CORRAL and LEGION in Sect. 5. 


5 Experimental Results 


We have built a tool, LEGION, that implements our UnderWidenST algorithm. 
To compare against OverRefSI, we use CORRAL [26], a state-of-the-art verifier 
used at Microsoft [24]. We also build a portfolio solver, LEGION”, that runs both 
CORRAL and LEGION in parallel. Whenever one of the tools finish verification, 
LEGION? terminates the algorithms and reports the outcome. 

We compare the performance of CORRAL against LEGION and LEGION™ on 
a suite of Windows and Linux device driver benchmarks. The Windows device 
driver benchmarks are obtained by running Static Driver Verifier (SDV) [4] on 
real windows device drivers that exercise all features of the C language such as 
arrays, heaps, pointers, loops, recursion etc. SDV compiles these drivers into a 
suite of BOOGIE [8] programs, each of which is a device driver paired with prop- 
erty (compilation is detailed in [24]). Note that, although the suite of Windows 
device drivers compiled into BOOGIE programs are available as SDV bench- 
marks [31], the actual C programs are internal to Microsoft. 

Along with this, we also use a set of Linux device drivers that are available as 
C programs as part of the SVCOMP benchmarks suite [7]. We used SMACK [36] 
to compile the Linux device drivers into BOOGIE programs. Overall, we elect 
to use a total of 727 hard programs, on which CORRAL took more than 200s to 
verify or times out, from the SDV and SVCOMP benchmarks to run our experi- 
ments. We set the timeout for each verification task to 2 h for both CORRAL and 
LEGION. For all verification tasks, We use an unrolling length of 3 as advised in 
the benchmarks [31] and used in other works [11]. 

As LEGION* uses twice the computational resources compared to CORRAL 
and LEGION, we halve its time budget to 1 h to make a fair comparison. We also 
report the performance of LEGIONY with a 2 h time budget (it can be seen as 
the virtual best of CORRAL and LEGION). 

The experiments were performed on a machine with AMD EPYC 7452 pro- 
cessor (48 cores) and 384 GB of RAM. Both CORRAL and LEGION uses Z3 [15] 
as the underlying SMT solver. We have used the default setting of a fixed ran- 
dom seed for Z3 for all our experiments after verifying the fact that the choice 
of random seed does not have any significant impact on our results. 


5.1 Corral Versus Legion 


Figure 6 depicts the number of solved instances within the time budget by COR- 
RAL and LEGION. In Fig.6, a point (x, y) denotes the number of instances x, 
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Fig. 6. Number of instances solved within time (in hours) for CORRAL vs LEGION vs 
LEGIONY. 


Table 3. Total time taken by each verifier to solve instances 


Verifier | Solved instances | Total time taken 


CORRAL 262 109 h 
LEGION 351 112 h 
LEGION* 369 71h 


each of which was solved within time y. As we can observe, CORRAL is able to 
solve 262 out of 727 instances (36%) with a time budget of 2 h per instance, 
whereas LEGION solves 351 instances (48%) with the same time budget. Both of 
them fail to solve 330 instances (45%). Out of the 397 instances (55%) that are 
solved by either CORRAL or LEGION, 46 instances (12%) are solved exclusively 
by CORRAL, whereas 135 instances (34%) are solved exclusively by LEGION. 
The scatter plot of verification times across LEGION and CORRAL is shown 
in Fig. 7. The spread in the scatter plots demonstrate that these two tools com- 
plement each other—the benchmarks on which CORRAL struggles are sometimes 
handled well by LEGION, and vice-versa. Picking the best of two verifiers solves a 
total of 397 out of 727 instances (55%). This motivated the design of LEGION’. 


5.2 Performance of Legiont 


As LEGION” utilizes parallelism, in order to make a fair comparison we halve 
the time budget for LEGION on each verification instance to 1 h. This means 
that LEGION runs both the tools CORRAL and LEGION in parallel but with a 
time budget of 1h each. 

Figure6 shows that the portfolio verifier LEGIONT solves 369 out of 727 
instances (51%) with a 1 h time budget, whereas CORRAL solves only 262 
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Fig. 7. Scatter plot of verification time of CORRAL vs LEGION. 


instances (36%) with a total time budget of 2 h. There are only 14 instances 
that CORRAL solves but LEGION” is unable to solve. Similarly, there are only 
17 instances that LEGION solves but LEGION* is unable to solve. 
With a 2h timeout, LEGIONY solves 397 instances in total (55%). This is 
essentially the virtual best of CORRAL and LEGION with a 2h timeout. 
Figure8 shows the total time taken (in hours) by CORRAL, LEGION and 
LEGIONY to verify the instances that were solved by all three of them (total 213 
instances). LEGION® is 1.9x faster than LEGION and 2.9x faster than CORRAL. 
Across the benchmarks that each of the tools solve individually, CORRAL 
takes 109 h to solve 262 benchmarks, LEGION takes 112 h to solve 351 bench- 
marks, whereas LEGION solves 369 benchmarks within only 71 h (see Table 3). 
Note that the benchmarks used in our study are those on which CORRAL 
took greater than 200s. On the rest of the benchmarks, clearly LEGION* will 
perform at least as well as CORRAL. We chose to leave them out to ensure that 
the experiments run in a reasonable time: there were roughly 14000 of these easy 
cases. It allowed us to focus on benchmarks where speedup was important. 


6 Related Work 


The high-level idea of using proof-guided abstractions has been long known [3, 
30]. Proofs of unsatisfiability have been used to derive abstractions for 
unbounded model checking in the context of microprocessor verification [30]. 
Amla et al. have also demonstrated that counterexample based abstraction is 
complementary to proof based abstraction and they can be combined in a judi- 
cious manner to reap the benefits of both the techniques for hardware verifi- 
cation tasks [3]. However, program verification has mostly been dominated by 
counterexample-guided abstraction refinement (CEGAR) based strategies. Of 
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Fig. 8. Cumulative time taken (in hours) to verify 213 instances that were solved by 
all three verifiers. 


the few proposals that use proof-guided underapproximation widening strategies, 
most of them focus on verification of multi-threaded programs [18,35]. These 
techniques perform underapproximation on the number of thread interleavings 
allowed, while eagerly inlining all procedures. One technique [18] constrains the 
number of interleavings to certain bounds, while the other [35] uses dynam- 
ically inferred invariants for constructing (potential) underapproximations on 
interleavings. Note that, these techniques are orthogonal to our approach. Eager 
inlining is not feasible for our benchmarks, which is precisely the problem that we 
address. Our proposal shows that proof-guided widening strategies can be effec- 
tively employed for verifying large sequential programs. Proof of unsatisfiability 
from underapproximated models have also been utilized to narrow down the 
search space for overapproximation refinement in order to decide finite precision 
bit vector arithmetic with arbitrary bit vector operations [9]. The underapproxi- 
mation is done on the bit vector variables of a propositional logic formula where 
some of the bit vector variables are encoded with fewer boolean variables than 
their width. 

Other than using proofs to guide widening heuristics, proof artifacts, like 
interpolants, have been used to construct annotations [1,2,27—29] that can be 
useful in constraining future search. Such techniques are orthogonal to underap- 
proximation widening based techniques. However, they can be useful for LEGION 
and we plan to investigate them in the future. 

Underapproximation widening has also been used in program synthe- 
sis [37,39,40]. Instead of unleashing the search for the program on the whole 
search space, such techniques search for the desired program in an underap- 
proximated search space. While prior approaches [37] used a pre-defined widen- 
ing sequence, later approaches [39,40] use proofs of unsatisfiability to guide 
the widening sequence. Similar techniques have also been used in the synthe- 


322 P. Chatterjee et al. 


sis of boolean functions [16,17]. Manthan [16,17] constructs an initial guess of 
the boolean function by sampling the specification and constructing a decision- 
tree classifier from the resulting data. It, then, uses a proof-guided technique to 
“repair” the learnt model into a desired function. 

There have also been applications of the maximal satisfiable set (MAXSAT) 
on an unsatisfiable formula for program debugging. BugAssist [19] attempts to 
infer the set of suspicious locations using a MAXSAT formulation over an failing 
program trace and the specifications. Bavishi et al. [6] extend the formulation 
to provide a ranking over the suspicious locations such that the locations higher 
up in the rankings are less likely to cause regressions. 

Another line of work is to use fuzzers to sample concrete instances and grad- 
ually build approximations of program behavior for the purpose of deductive 
verification [22] and symbolic execution [34]. However, such approaches use test 
instances and do not apply a proof-guided strategy. 

LEGION is inspired by many of the above algorithms and, there is potential 
of incorporating more of these ideas in LEGION in the future. 


7 Conclusion 


Bounded model checking approaches for program verification predominantly 
focuses on CEGAR based strategies. In this work, we propose a proof-guided 
underapproximation widening strategy which behaves in a complementary man- 
ner to the CEGAR technique. The complementary nature allows us to build a 
portfolio strategy that takes advantage of both proof-guided underapproxima- 
tion widening and CEGAR to deliver a significant speed up in verification time 
over both. 

Our current approach only looks at the predicates corresponding to the call- 
sites to figure out which are most relevant to the proof of unsatisfiability of the 
underapproximated model. In the future, we aim to extract additional informa- 
tion from the unsat core which would allow us to explore more involved widen- 
ing strategies. Furthermore, combining the underapproximation techniques that 
work on the domain of thread interleavings to deal with a large space of sequen- 
tial behaviors (via lots of procedures) and concurrent behaviors (via lots of inter- 
leavings) would be another interesting direction to explore. We also believe that 
underapproximation widening may yield improvement performance on our dis- 
tributed bounded model checker, HYDRA [11,12]. Another interesting direction 
that we want to pursue is on combining bounded model checking algorithms 
(both overapproximation refinement and underapproximation widening) with 
dynamic analysis [5, 13,38] and statistical testing [10,32] based approaches. 
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Abstract. Formally verifying smart contracts is important due to their 
immutable nature, usual open source licenses, and high financial incen- 
tives for exploits. Since 2019 the Ethereum Foundation’s Solidity com- 
piler ships with a model checker. The checker, called SolICMC, has two 
different reasoning engines and tracks closely the development of the 
Solidity language. We describe SolICMC’s architecture and use from the 
perspective of developers of both smart contracts and tools for software 
verification, and show how to analyze nontrivial properties of real life 
contracts in a fully automated manner. 


Keywords: Ethereum - Solidity - Symbolic model checking + 
Constrained Horn clauses - Satisfiability modulo theories 


1 Introduction 


The Ethereum Foundation’s compiler for Ethereum platform’s most used lan- 
guage Solidity had almost 4 million downloads (3,957,195) over the last 60 days 
(at the time of submission). Since 2019, this compiler ships with a robust, builtin, 
easy-to-use, symbolic model checker SolCMC [16], formerly called SMTChecker. 
SolCMC models a smart contract, that is, a program for the Ethereum platform, 
and its properties as a system of constrained Horn clauses (CHCs) amenable 
to IC3-style model checking [34]. Since its deployment, SolCMC has increas- 
ingly served a dual purpose. On the one hand, smart contract programmers 
have through it a very visible and easy access to formal verification techniques. 
On the other hand, perhaps more subtly but no less importantly, the tool serves 
as a sounding board for developers of Horn solvers. Currently the system inter- 
faces with Spacer [31] and Eldarica [30], making the related techniques available 
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to a large user base. We expect to integrate in SoICMC many other techniques 
through a similar mechanism. For instance, the tool has a bounded model check- 
ing engine for finding bugs by issuing SMT queries to solvers such as z3 [35] and 
evce5 [23]. 

Smart contracts running on the Ethereum platform hold and control billions 
of dollars through their immutable logic, and therefore bugs can lead to massive 
losses. There are many recent sophisticated tools that increase the security of 
the Ethereum contract ecosystem by detecting smart contract bugs before they 
are deployed. However, new and emerging applications from the diverse user 
base are driving Solidity development at a fast pace and it is difficult to keep 
tools synchronized with the language. We believe that in the long run, the best 
way to ensure that a model checker for Solidity is sustainable is by integrating 
it directly into the compiler distribution, or the main repository of the related 
language tools, as we have done for SolCMC. 

The direct integration of the model checker into the compiler has two main 
advantages. Firstly, we can model precisely and robustly features that are some- 
what specific to Solidity and its applications, such as modeling reentrancy call- 
backs, and the handling of global storage. This makes the model checker capable 
of synthesizing new contracts that serve as counterexamples for correctness, and 
computing inductive invariants for the cases where properties hold. Secondly, the 
short pipeline between the source code and the model allows the presentation 
of both counterexamples and invariants as compiler warnings and annotations 
using a vocabulary that is meaningful for the developer. 

The goal of SolCMC is to verify properties of programs with minimal user 
input. Our system supports writing properties as assert statements and can 
in addition automatically check other structural properties such as popping 
from an empty array and array accesses that are out of bounds, and the lack 
of underflows, overflows, divisions by zero, and transfers with insufficient bal- 
ance. Moreover, common Solidity vulnerabilities such as reentrancy mutability 
and selfdestruct reachability can be verified using test harnesses that make the 
assertion-based approach more expressive. Thus, the expressiveness of SolICMC 
allows efficiently obtaining meaningful results for real life contracts in a way that 
is in practice fully automated. To demonstrate this we analyze the Beacon Chain 
Deposit Contract that is the base for Ethereum’s proof of stake consensus layer, 
and the OpenZeppelin implementation of the ERC777 token standard. 

An extended version of this tool paper including appendices showing detailed 
experimental results and other analysis is available online in the accompanying 
artifact, at https://doi.org/10.5281/zenodo.6512173. 


Related Work. Proving correctness and finding bugs in smart contracts is use- 
ful in different abstraction targets. The technical details of how smart contracts 
are encoded by SolCMC are presented in [34]. In this tool paper the empha- 
sis is on orthogonal topics: the usage of options, generation of counterexamples 
in Solidity-like syntax, interfacing with different Horn solvers, and how con- 
tract invariants can be obtained. We also demonstrate the tool’s capabilities 
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Fig. 1. The Solidity compiler stack with the integrated model checker (in green) (Color 
figure online) 


by analysing two important and complex contracts: the Deposit contract and 
ERC777. 

Most current tools either analyse the Solidity high level language, similar to 
SolCMC, or work directly on Ethereum Virtual Machine (EVM) bytecode. 

The tools Solc-verify [28] and Verisol [38] verify Solidity properties in an auto- 
mated way allowing models with unbounded number of transactions by trans- 
lating Solidity to Boogie [33]. This gives the tools an advantage in engineering 
resources, but, compared to SolCMC’s direct encoding as CHCs, makes produc- 
ing counterexamples to the user more difficult. Neither of the two tools produce 
counterexamples or inductive invariants, and the most recent language versions 
are not supported. SmartACE [39] relies on translation from Solidity to LLVM- 
IR. This allows for employing multiple analysis tools, but unlike in SolCMC 
where we use a direct encoding and tight solver integration, the tools are mostly 
used as black boxes. EThor [37] also uses Horn clauses but it encodes EVM 
bytecode, and focuses on specific properties such as reentrancy. The Certora [24] 
tool relies on invariants to verify EVM bytecode. It is a commercial tool used for 
smart contract audits and is not publicly available. The K framework [10] is an 
assisted theorem prover that provides EVM semantics [29] to analyze EVM byte- 
code. It is generally able to prove more statements than automated tools, but 
requires considerable user interaction. HEVM [22] is an implementation of EVM 
in Haskell that also has a symbolic executor for EVM bytecode. It can prove 
functional properties but, unlike SolICMC, does not support inductive proper- 
ties over multiple transactions and loops. HEVM and Echidna [4] also provide 
fuzzing techniques that help determining whether a candidate assertion is a con- 
tract invariant. Slither [14] is a powerful static analyzer that does not provide 
formal guarantees but can detect many vulnerabilities and dangerous patterns. 
Act [1] is a declarative specification language for smart contracts that supports 
three backends: bytecode verification via HEVM, SMT theorems for contract 
invariants, and a Coq backend that exports Coq definitions of contract state 
transitions. Finally, the Scribble specification language [13] allows annotating 
Solidity code and can generate runtime checks for given properties. 
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Table 1. SolCMC verification targets 


Arithmetic Structural 


arithmetic underflow/overflow, division assertions, popping empty array, out of 
by zero, insufficient transfer balance bounds index access 


2 Solidity Model Checking 


The high level overview of the compilation process is depicted in Fig. 1, with 
the model checker module emphasized. When enabled, Solidity model checking 
becomes another pass over the source code in the normal compilation process 
that starts after parsing and Abstract Syntax Tree (AST) generation. If there 
were no errors, the compiler produces the optimized bytecode together with any 
warnings, such as counterexamples found by the model checker. 

This paper concentrates on SolCMC’s unbounded model checker based on 
CHCs. The tool also has a BMC engine that generates SMT queries and links 
against cvc5 [23] and z3 [35]. 


2.1 The CHC Verification Engine 


SolCMC encodes a smart contract as a system of constrained Horn clauses, 
based on [34]. The checker supports loops, multi-transaction computation paths, 
contract invariants, tracking contract balances throughout their lifetimes, and 
precise multi-contract calls. If the analyzed contract calls external functions 
unsafely, the model checker also synthesizes malicious external actors and rep- 
resents them as reentrant calls. 

The Horn queries are dispatched to a Horn solver. The encoding requires 
the solver to support nonlinear Horn clauses and at least the SMT theories for 
Linear Integer Arithmetic (LIA), Arrays, and the tuples subset of Algebraic 
Datatypes (ADT). Furthermore, nonlinear integer arithmetic and bitwise opera- 
tions, if present, are encoded in the respective theories NIA and BV. To the best 
of our knowledge only Spacer [31] and Eldarica [30] satisfy those requirements. 
SolCMC has a tight integration with Spacer via its C++ API, whereas Eldarica 
is integrated using the compiler’s SMT callback [21], and is currently accessible 
via solc-js [15], the JavaScript wrapper of the compiler’s WebAssembly binary. 

The model checker generates verification targets automatically for the con- 
ditions listed in Table 1. In particular a smart contract developer can combine 
assertions with test harnesses (see, e.g., Sect. 4) to specify complex behavior. 
The Solidity language has the statements require and assert, which SolCMC 
uses to capture developer intent: Conditions inside require statements are con- 
sidered assumptions, and assert statements should be true for every execution. 
The model checker then treats every assert as a verification target and attempts 
to either prove it by finding an invariant, or give a counterexample for its 
correctness. 
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2.2 Horn Encoding 


SolCMC’s CHC encoding is based on the imperative encoding of [25], and is pre- 
sented in detail in [34]. Horn logic is a popular formalism for expressing reacha- 
bility problems. It is equivalent to the existential positive fiz-point logic [26], and 
provides a convenient syntax for the use of existentially quantified predicates 
that in our encoding represent reachable states and effects of transactions. The 
Solidity AST first gets transformed into a Control Flow Graph (CFG). CFG 
nodes have corresponding CHC predicates, and edges are encoded as Horn rules 
with constraints created from the Single Static Assignment (SSA) form of the 
statements and expressions of the CFG block. Below we give an overview of the 
encoding that highlights the critical parts. 

The encoding consists of three types of predicates that represent reachable 
states or possible transitions: function bodies (By) and summaries (Sf) represent 
the effect of function calls to f; interfaces (Ic) represent the states a contract C 
can reach after initialization and each transaction; and nondeterministic inter- 
faces (Nc) encode the effects the environment may have to a contract C. We 
use the following variables in the encoding: e, an integer error flag. Each veri- 
fication target has a positive unique error id; 0 is reserved for no errors. a, the 
contract address. abi, a tuple of Solidity’s ABI functions. cr, a tuple of Solidity’s 
cryptographic functions: keccak256, sha256, ripemd160, and ecrecover. Both 
abi and cr are constant in the encoding. They are passed through the rules to 
ensure consistency everywhere. tx, a tuple of the transaction data, e.g., message 
sender, data, block number, etc. st, the blockchain state, a tuple containing the 
balances and storage for every contract. Balances are represented by an array 
mapping addresses to their balances. Each contract has a storage tuple that con- 
tains the state variables of that contract. x, the program state, input, output 
and local variables in the scope of that node. When necessary, we refer to the 
state variables as s. For x and st we use primes to denote the effect of rules on 
these variables. 


Function bodies encode constructors, deployment procedures, and function sum- 
maries. For example, the contract contract Acc { uint8 x = 0; function 
acc(uint8 y) external { x += y; } } gets encoded into the rules 


e=O0Ast=st'Ar=a2' Ay=y' A0 <y <255A0 <a’ < 255 
= Bac(e, a, abi, cr, tx, st, 2,y,st’, 2’, y’) 
stating that the function can always be called, its execution starts with no error, 
the initial variables have the current values, and the program variables’ types 
are constrained; 
Bacc(e, a, abi, cr, tx, st, x,y, st’, 2’, y') A (a +y > 255) 
=> Sacc(1, a, abi, cr, tx, st, x,y, st’, x’, y’) 
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stating that an overflow in summation is an error, with label 1; and 


Bacc(€, a, abi, cr, tx, st, £, y, st’, 2’, y’) A(z” =x +y) A(x" < 255) 
=> Sacc(e, a, abi, cr, tx, st, £, y, st’, £”, y), 


which exits the function with no error and updates the contract state variable x. 


Interface Rules. The interface CFG node is an artificial node that represents 
the idle state of a contract. This node is crucial to the encoding when mod- 
elling transactions, querying error flags, committing state changes, generating 
counterexamples, and translating inductive contract invariants. It is reachable 
at the beginning and end of every transaction. Transactions may revert due to 
invalid inputs or program logic, in which case all state changes are rolled back. 
The interface node may contain state changes if the transaction did not revert. 
Each contract C has a predicate Ic, whose parameters are a, abi,cr, st and 
the state variables s of the contract. The rules only change e,st and s, and for 
better readability we use ellipsis (...) to denote the unchanged parameters. One 
rule is added per contract linking the deployment procedure to the interface: 
D<(...) => Ic(...). For each external function f in the contract C, we add the 
query rule and the update rule 


Ic(...,st,s,...) A Sz(e,...,st,s,...,st’,s’,...)A\e>0 => Err;(e) 
Ic(...,st,s,...) A Sz(e,...,st,s,...,st’,s’,...)Ae=0 => Ic(...,st’,s’,...). 


The Horn query given to the solver then asks whether Err;(e) is reachable, for 
each error label e. In this modelling, if the property is safe, inductive invariants 
chosen by the solver as an interpretation for the predicates Ig represent the 
invariants for contracts C. 


Nondeterministic Interface Rules. The nondeterministic interface CFG node 
is an artificial node that represents every possible behavior of the contract 
from an external point of view, in an unbounded number of transactions. This 
node is essential to model calls that the contract makes to external unknown 
contracts, as well as reentrancy if present. The predicate that represents this 
node has the same parameters as the interface predicate, but with the error 
flag and an extra set of program variables and blockchain state, in order to 
model possible errors and state changes. For every contract C the encoding adds 
the base case rule Nc(0,...,st,s,st,s) which performs no state changes. Then 
for every external function f in the contract the encoding adds the inductive 
rule N(0,...,st,s,st’,s’) A Sz(e,...,st’,s’,st”,s”) == N(e,...,st,s,st”,s”). 
These rules allow us to encode an external call to unknown code using a single 
constraint N(e,...,st,s,st’,s’) which models every reachable state change in 
the contract, in any unbounded number of transactions. If a property is unsafe, 
these rules force the solver to synthesize the behavior of the adversarial contract. 
Otherwise, the interpretation of such predicate gives us inductive reentrancy 
properties that are true for every external call to unknown code in the contract. 
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3 User Features 


As SolCMC is shipped inside the Solidity compiler, it is available for the users 
whenever and wherever they interact with the compiler. There are currently three 
major ways the compiler is used: 1. Interfacing with the WebAssembly release 
through official JavaScript bindings; 2. Interfacing with a binary release on com- 
mand line; 3. Using web based IDEs, such as Remix [12]. Option 3 is the most 
accessible, but currently allows only limited configuration of the model checker 
through pragma statements in source code. Options 1 and 2 both allow extensive 
configuration, but in addition 1 enables the SMT callback feature needed, e.g., 
for Eldarica. In 2 the options can be provided either on the command line or in 
JSON [19], whereas 1 accepts only JSON using the JavaScript wrapper [15]. 

In 1 and 2 several parameters are available to the user for better control when 
trying to prove complex properties. We list here some examples, using the com- 
mand line options (without the leading --). The JSON descriptions are named 
similarly. The model checking engine—BMC, CHC or both—is selected with 
the option model-checker-engine. Individual verification targets can be cho- 
sen with model-checker-targets, and a per-target verification timeout (in ms) 
can be set with model-checker-timeout. By default, all unproved verification 
targets are given in a single message after execution. More details are available by 
specifying model-checker-show-unproved. Option model-checker-contracts 
provides a way to choose the contracts to verify. Typically the user specifies only 
the contract they wish to deploy. Inherited and library contracts are included 
automatically, avoiding verifying every contract as the main one. Some options 
affect the encoding. For example, integer division and modulo operations can 
be encoded with the SMT function symbols div and mod or by SolCMC’s own 
encoding using linear arithmetic and slack variables. Depending on the backend 
one is often preferred to the other. The default is the latter, the former is set by 
model-checker-div-—mod-no-slacks. 

Solidity provides the NatSpec [20] format for rich documentation. An annota- 
tion /// @custom:smtchecker abstract-function-nondet instructs SolCMC 
to abstract a function nondeterministically. Abstracting functions as an Unin- 
terpreted Function [32] is under development. 


Counterexamples and Inductive Invariants. When a verification target is dis- 
proved, SolCMC provides a readable counterexample describing how to reach 
the bug. In addition to the line of code where the verification target is breached, 
the counterexample states the trace of transactions and function calls leading 
to the failure along with concrete values substituted for the arguments, and the 
values of the state variables at the point of failure. When necessary, the trace 
includes also synthesized reentrant calls that trigger the failure. 

Similarly, when SolCMC proves a verification target, the user may instru- 
ment the checker to provide safe inductive invariants. The invariants can, for 
instance, be used as an additional proof that the verification target holds. Tech- 
nically the invariants are interpretations for the predicates in the CHC system 
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and are presented in a human readable Solidity-like syntax. Similarly to coun- 
terexamples, the invariants are given also for predicates guaranteeing correctness 
under reentrancy. The extended version of this paper contains a short example 
illustrating the counterexamples and inductive invariants. It also presents more 
complex examples of both features, which were obtained from our experiments 
with the ERC777 token standard. 


4 Real World Experiments 


In this section we analyse two real world smart contract systems using SolCMC. 
Both contracts are massively important and highly nontrivial for automated 
tools due to their use of complex features, loops, and the need to produce non- 
trivial inductive invariants. While only the main results are stated in this section, 
we want to emphasize that the results were achieved after an extensive, albeit 
mechanical, experimentation on the two backend solvers (Spacer and Eldarica) 
and a range of parameters. To us the fact that they were successfully analysed 
using an automatic method is a strong proof of the combined power of our 
encoding approach and the backend solvers. 


4.1 CHC Solver Options 


The options we pass to the underlying CHC solvers Spacer and Eldarica may 
make the difference between a quick solving and divergences. For Spacer, we use 
the options rewriter.pull_cheap_ite=true which pulls if-then-else terms to 
the top level when it can be done cheaply, fp. spacer .q3.use_qgen=true which 
enables the quantified lemma generalizer, fp. spacer .mbqi=false which disables 
the model-based quantifier instantiation, and fp.spacer.ground_pobs=false 
which grounds proof obligations using values from a model. For Eldarica, we have 
found the adjustment of the predicate abstraction to be useful: -abstract: off 
disables abstraction, -abstract : term uses term abstraction, and -abstract :oct 
uses the octal abstraction. 


4.2 Deposit Contract 


The Ethereum 2.0 (Eth2) [9] Deposit Contract [2,3] is a smart contract that runs 
on Ethereum 1.0 collecting deposits from accounts that wish to be validators on 
Eth2. By the time of submission of this paper more than 9,100,194 ETH were 
held by the Deposit Contract, the equivalent of tens of billions USD in recent 
rates. Besides the financial incentive, this contract’s functionality is essential to 
the progress of the protocol. The contract was formally verified before deploy- 
ment [36] and further proved safe [27] with considerable amount of manual work. 
Despite having relatively few lines of code (less than 200), the contract remains 
a challenge for automated tools, because of its use of many complex constructs 
at the same time, such as ABI encoding functions, loops, dynamic types, and 
hash functions. 
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As part of the logic of the deposit function, a new entry is created in a 
Merkle tree for the caller. The contract asserts that such an entry can always be 
found, expressed as an assert (false) in a program location reachable only if 
such an entry is not found (line 162 in [2]). Using SolCMC this problem can be 
encoded into a 1.4MB Horn logic file containing 127 rules, which uses the SMT 
theories for Arrays, ADTs, NIA, and BV. After a syntactical change, Eldarica 
can show the property safe automatically in 22.4s, while Spacer times out after 
Lh (see the extended version for details). The change is necessary to avoid bit- 


vector reasoning and consists of replacing the test if ((size & 1) == 1) with 
a semantically equivalent form if ((size % 2) == 1) on lines 88 and 153 in [2]. 
4.3 ERC777 


ERC777 [6] is a token standard that offers extra features compared to the 
ERC20 [5] standard. Besides the usual transfer and allowance features, ERC777 
mainly adds account operators and transfer hooks which allow smart contracts 
to react to sending and receiving tokens. This is similar to the native feature 
of reacting to receiving Ether. In this experiment we analyze the OpenZeppelin 
implementation [11] of ERC777. This contract is an interesting benchmark for 
automated tools not only because of its importance, but also because it is a rather 
large smart contract system with 1200 lines of Solidity code, in 8 files, and it 
uses complex high level constructs such as assembly blocks, heavy inheritance, 
strings, arrays, nested mappings, loops, hash functions, and makes external calls 
to unknown code. The implementation follows the specification precisely, and 
does not guarantee a basic safety property related to tokens: The total supply of 
tokens should not change during a transfer. 

Compared to the usual ERC20 token transfer that simply decreases and 
increases the balances of the two accounts involved in the transfer, the ERC777 
transfer function may call unknown contracts to notify them that they are 
sending/receiving tokens. The logic in these external contracts is completely 
arbitrary and unknown to the token contract. For example, they could make 
a reentrant call to one of the nine ERC777 token mutable functions from its 
external interface. 

Since the analyzed ERC777 implementation is agnostic on how tokens are ini- 
tially allocated, no tokens are distributed in the base implementation at deploy- 
ment. Therefore, to study the property, we write the following test harness [7] 
that uses the ERC777 token implemented by OpenZeppelin. 


import "<path>/ERC777.sol"; function transfer(address r, uint a) 
public override returns (bool) { 
contract Harness is ERC777 { uint prev = totalSupply(); 
constructor ( bool res = ERC777.transfer(r, a); 
address[] memory defOps_, uint post = totalSupply(); 
uint amt_ assert (prev == post); 
) ERC777("ERC777", "E7", defOps_){ return res; 
-mint (msg.sender, amt_, "", ""); } 
F} } 
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Recipient ERC777 Sender 


callTokensToSend 
transfer tokensToSend 


callToTokensReceived 
tokensReceived 


operatorBurn 


Fig. 2. Transaction trace that violates the safety property in transfer 


First, we allocate amt_ tokens to the creator of the contract, in order to 
have tokens circulating. Then, we override the transfer function, where our 
transfer function simply wraps the one from the ERC777 contract, asserting 
that the property we want to verify is true after the original transfer. 

The resulting Horn encoding is 15 MB large and contains 545 rules. The 
property can be shown unsafe by Eldarica in all its configurations, the quickest 
taking slightly less than 3 min, including generating the counterexample (see the 
extended version for details). All Spacer’s configurations time out after 1h. Since 
the property is unsafe, SolCMC also provides the full transaction trace required 
to reach the assertion failure. The transaction trace is visualized in Fig. 2 in 
the form of a sequence diagram, where solid arrows represent function calls and 
dashed arrows represent the return of the execution control. The full output of 
the tool can be found in the extended version. 

The diagram shows the transaction trace from the call to transfer of 
ERC777 (after our wrapper contract has been created and its transfer 
was called). transfer performs 3 internal function calls (in orange): 
1) _callTokensToSend performs the external call to notify the sender; 2) -move 
moves the tokens from the sender to the recipient; 3) _callTokensReceived 
notifies the recipient. The external calls to unknown code are shown in red. The 
transaction trace also contains the synthesized behaviour for the recipient (in 
purple). It is a reentrant call to operatorBurn in the ERC777 token contract 
itself, where some of the tokens of the recipient contract will be burned. At the 
end of the execution of transfer, the assertion is no longer true. The total sup- 
ply of tokens after the call is not the same as the total supply before the call, as 
some tokens were burned during the transaction. 

Given the number of mutable external functions of ERC777 and their com- 
plexity, we consider the discovery of the counterexample to be quite an achieve- 
ment. We ascribe the success to the combined power of the CHC encoding and 
the Horn solver. 

One way to guarantee that our property holds is to disallow reentrancy 
throughout the contract using a mutex. After changing the ERC777 library [8], we 
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ran the tool again on our test harness. Spacer timed out, but Eldarica was able 
to prove that the restricted system is safe in all its configurations, the fastest one 
finishing in 26.2s, including the generation of the inductive invariants for every 
predicate. SolCMC now reports back the reentrancy property <errorCode> = 
O given as part of the proof (the property is presented here in a simplified man- 
ner, see the extended version for details). The inductive property states that 
no external call performed by the analyzed contract can lead to an error. This 
shows that the reentrant path can no longer be taken. 


4.4 Discussion 


While producing the above analysis of the real life contracts, we experimented 
with two backend solvers Spacer and Eldarica, and a range of parameters for 
them. This phase (documented in the extended version of this paper) was critical 
in producing the results, because Eldarica and Spacer excel in different domains 
and parameter selection has a major impact on both verification success and run 
time. In both cases above Eldarica performed clearly better than Spacer. This 
seems to be because Eldarica handles abstract data types better than Spacer. 
This conclusion is backed by experimental evidence. We ran SolCMC using both 
Spacer and Eldarica on the SolCMC regression test suite consisting of 1098 solid- 
ity files [17] and 3688 Horn queries [18]. The experiment shows that while the 
solvers give overall similar results, in two categories that make heavy use of 
ADTs, Eldarica is consistently able to solve more benchmarks than Spacer. For 
lack of space, the detailed analysis is given in the extended version. 

Our encoding uses tuples to encode data that makes sense to be bundled 
together. Moreover, arrays of tuples are used to emulate Uninterpreted Func- 
tions (UFs) to abstract injective functions such as cryptographic primitives. This 
is necessary due to UFs not being syntactically allowed in predicates of Horn 
instances. While this increases the complexity of the problem, we have chosen 
this path to reduce encoding complexity, considering that a pre processing step 
may be available in the future to flatten such tuples and arrays. 


5 Conclusions and Future Work 


This paper presents the model checker SolCMC that ships with the Ethereum 
Foundation’s compiler for the Solidity language. We believe that the automated 
and usable tool has the potential to link a high volume of Solidity developers with 
the community working on tools for formal verification. The tool is stable, and, 
having been integrated into the compiler, tracks closely the quickly developing 
language. 

We advocate for a direct encoding approach where the same AST gets com- 
piled both into EVM bytecode and into a verification model in SMT-LIB2 or the 
format used in the CHC competition. In our experience this makes it more natu- 
ral to model features specific to Solidity and Ethereum smart contracts as well as 
for generating usable counterexamples and inductive invariants in comparison to 
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producing first a language-agnostic intermediate verification representation that 
is then processed for reasoning engines. 

We argue for the ease of use of the tool by showing nontrivial properties 
of real life contracts. The experiments also identify interesting future develop- 
ment opportunities in the current CHC formalism. We show how the formalism’s 
limitations can be worked around using abstract data types, and discuss their 
impact on tool efficiency. 
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Abstract. Temporal hyperproperties are system properties that relate 
multiple execution traces. For (finite-state) hardware, temporal hyper- 
properties are supported by model checking algorithms, and tools for 
general temporal logics like HyperLTL exist. For (infinite-state) soft- 
ware, the analysis of temporal hyperproperties has, so far, been limited 
to k-safety properties, i.e., properties that stipulate the absence of a bad 
interaction between any k traces. In this paper, we present an automated 
method for the verification of V"S'-safety properties in infinite-state sys- 
tems. A V"4!-safety property stipulates that for any k traces, there exist 
l traces such that the resulting k + l traces do not interact badly. This 
combination of universal and existential quantification enables us to 
express many properties beyond k-safety, including, for example, gen- 
eralized non-interference or program refinement. Our method is based 
on a strategy-based instantiation of existential trace quantification com- 
bined with a program reduction, both in the context of a fixed predicate 
abstraction. Notably, our framework allows for mutual dependence of 
strategy and reduction. 


Keywords: Hyperproperties - HyperLTL - Infinite-state systems - 
Predicate abstraction - Hyperliveness - Verification - Program reduction 


1 Introduction 


Hyperproperties are system properties that relate multiple execution traces of a 
system [22] and commonly arise, e.g., in information-flow policies [35], the veri- 
fication of code optimizations [6], and robustness of software [19]. Consequently, 
many methods for the automated verification of hyperproperties have been devel- 
oped [27,39-41]. Almost all previous approaches verify a class of hyperproperties 
called k-safety, i.e., properties that stipulate the absence of a bad interaction 
between any k traces in the system. For example, we can express a simple form 
of non-interference as a 2-safety property by stating that any two traces that 
agree on the low-security inputs should produce the same observable output. 
The vast landscape of hyperproperties does, however, stretch far beyond k- 
safety. The overarching limitation of k-safety (or, more generally, of hypersafety 
[22]) is an implicit universal quantification over all executions. By contrast, many 
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properties of interest, ranging from applications in information-flow control to 
robust cleanness, require a combination of universal and existential quantifica- 
tion. For example, consider the reactive program in Fig. 1, where xy denotes a 
nondeterministic choice of a natural number. We assume that h, l, and o are 
a high-security input, a low-security input, and a low-security output, respec- 
tively. This program violates the simple 2-safety non-interference property given 
above as the non-determinism influences the output. Nevertheless, the program 
is “secure” in the sense that an attacker that observes low-security inputs and 
outputs cannot deduce information about the high-security input. To capture 
this formally, we use a relaxed notion of non-interference, in the literature often 
referred to as generalized non-interference (GNI) [35]. We can, informally, express 
GNI in a temporal logic as follows: 


Va Vr 3r”. 0 (ox Soh Alem Ian A hg = hrr) 


This property requires that for any two traces 7,7’, there exists some trace 7” 
that, globally, agrees with the low-security inputs and outputs on 7 but the high- 
security inputs on 7’. Phrased differently, any observation on the low-security 
input-output behavior is compatible with every possible high-security input. 
The program in Fig.1 satisfies GNI. Crucially, GNI is no longer a hypersafety 
property (and, in particular, no k-safety property for any k) as it requires a 
combination of universal and existential quantification. 


1.1 Verification Beyond k-Safety 


Instead, GNI falls in the general class of Y*3*- repeat 

safety properties. Concretely, a V"s!-safety readInput(h, l) 
property (using k universal and / existential . , 
quantifiers) stipulates that for any k traces, if h >l then 
there exist l traces such that the resulting k+l o+1+xn 
traces do not interact badly. k-safety properties else 

are the special case where | = 0. We study the 


verification of such properties in infinite-state F te. 
systems arising, e.g., in software. In contrast if x > l then 
to k-safety, where a broad range of methods Oo+ 2 

has been developed [10,27,39-41], no method else 

for the automated verification of temporal V*i* ou] 


properties in infinite-state systems exists (we 
discuss related approaches in Sect. 8). Fig. 1. An example program is 
Our novel verification method is based on a depicted. 

game-based reading of existential quantification 

combined with the search for a program reduc- 

tion. The game-based reading of existential quantification instantiates existential 
trace quantification with an explicit strategy and constitutes the first practica- 
ble method for the verification of Y*J*-properties in finite-state systems [23]. 
Program reductions are a well-established technique to align executions of inde- 
pendent program fragments (such as the individual program copies in a self- 
composition) to obtain proofs with easier invariants [27,34,39]. 
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So far, both techniques are limited to their respective domain, i.e., the game- 
based approach has only been applied to finite-state systems and synchronous 
specifications, and reductions have (mostly) been used for the verification of k- 
safety. We combine both techniques yielding an effective (and first) verification 
technique for hyperproperties beyond k-safety in infinite-state systems arising in 
software. Notably, our search for reduction and strategy-based instantiation of 
existential quantification is mutually dependent, i.e., a particular strategy might 
depend on a particular reduction and vice versa. 


1.2 Contributions and Structure 


The starting point of our work is a new temporal logic called Observation-based 
HyperLTL (OHyperLTL for short). Our logic extends the existing hyperlogic 
HyperLTL [21] with capabilities to reason about asynchronous properties (i.e., 
properties where the individual traces are traversed at different speeds), and to 
specify properties using assertions from arbitrary background theories (to reason 
about the infinite domains encountered in software) (Sect. 4). 

To automatically verify V*a' OHyperLTL properties, we combine program 
reductions with a strategy-based instantiation of existential quantification, both 
in the context of a fixed predicate abstraction. To facilitate this combination, we 
first present a game-based approach that automates the search for a reduction. 
Concretely, we construct an abstract game where a winning strategy for the 
verifier directly corresponds to a reduction with accompanying proof. As a side 
product, our game-based interpretation simplifies the search for a reduction in 
a given predicate abstraction as, e.g., studied by Shemer et al. [39] (Sect. 5). 

Our strategic (game-based) view on reductions allows us to combine them 
with a game-based instantiation of existential quantification. Here, we view the 
existentially quantified traces as being constructed by a strategy that, iteratively, 
reacts to the universally quantified traces. As we phrase both the search for a 
reduction and the search for existentially quantified traces as a game, we can 
frame the search for both as a combined abstract game. We prove the sound- 
ness of our approach, i.e., a winning strategy for the verifier constitutes both 
a strategy for the existentially quantified traces and accompanying (mutually 
dependent) reduction. Despite its finite nature, constructing the abstract game 
is expensive as it involves many SMT queries. We propose an inner refinement 
loop that determines the winner of the game (without constructing it explicitly) 
by computing iterative approximations (Sect. 6). 

We have implemented our verification approach in a prototype tool called 
HyPA (short for Hyperproperty Verification with Predicate Abstraction) and 
evaluate HyPA on k-safety properties (that can already be handled by existing 
methods) and on V*5*-safety benchmarks that cannot be handled by any existing 
tool (Sect. 7). 


Contributions. In short, our contributions include the following: 


— We propose a temporal hyperlogic that can specify asynchronous hyperprop- 
erties in infinite-state systems; 
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— We propose a game-based interpretation of a reduction (improving and sim- 
plifying previous methods for k-safety [39]); 

— We combine a strategy-based instantiation of existentially quantified traces 
with the search for a reduction. This yields a flexible (and first) method for 
the verification of temporal V*3* properties. We propose an iterative method 
to solve the abstract game that avoids an expensive explicit construction; 

— We provide and evaluate a prototype implementation of our method. 


2 Overview: Reductions and Quantification as a Game 


Our verification approach hinges on the observation that we can express both 
a reduction and existential trace quantification as a game. In this section, we 
provide an overview of our game-based interpretations. We begin by outlining 
our game-based reading of a reduction (illustrating this in the simpler case of 
k-safety) in Sect. 2.1 and then extend this to include a game-based interpretation 
of existential quantification in Sect. 2.2. 


2.1 Reductions as a Game 


Consider the two programs in Fig.2 and the specification that both programs 
produce the same output (on initially identical values for x). We can formalize 
this in our logic OHyperLTL (formally defined in Sect. 4) as follows: 


Vtr : (pe = 2). V? m : (pe = 2). (fq, = 2ra) 2 O(Se, = Era) 


The property states that for all traces 7, in P1 and m2 in P2 the LTL specification 
(tr, = r2) > Olr, = Lra) holds (where x, refers to the value of x on trace 
a). Additionally, the observation formula pc = 2 marks the positions at which 
the LTL property is evaluated: We only observe a trace at steps where pc = 2 
(i.e., where the program counter is at the output position). 

The verification of our property involves reasoning about two copies of our 
system (in this case, one of P1 and one of P2) on disjoint state spaces. Conse- 
quently, we can interleave the statements of both programs (between two obser- 
vation points) without affecting the behavior of the individual copies. We refer 
to each interleaving of both copies as a reduction. The choice of a reduction 
drastically influences the complexity of the needed invariants [27,34,39]. Given 
an initial abstraction of the system [30,39], we aim to discover a suitable reduc- 
tion automatically. Our first observation is that we can phrase the search for a 
reduction as a game as follows: In each step, the verifier decides on a scheduling 
(i.e., a non-empty subset M C {1,2}) that indicates which of the copies should 
take a step (i.e., i € M iff copy i should make a program step). Afterward, the 
refuter can choose an abstract successor state compatible with that scheduling, 
after which the process repeats. This naturally defines a finite-state two-player 
safety game that we can solve efficiently.' If the verifier wins, a winning strategy 


1 The LTL specification is translated to a symbolic safety automaton that moves 
alongside the game. For sake of readability, we omitted the automaton from the 
following discussion. 
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{1, 2} 


1: repeat 1: repeat 
2: print(z) 2: print(z) 
3: yo dw 3 yer {L 2}| z1 = x2 
4: whiley>0do 4: while y> 0 do 
5: y=y-l1 5: y= y-li a 4 
6: x — 2r 6: x 4r 
(a) Program P1 (b) Program P2 {1} fu = ae zi = es ao = tes 
yı = 2y2 2y2 4 2y2 4 


c) Winning strategy for the verifier. 


Fig. 2. Two output-equivalent programs P1 and P2 are depicted in Fig. 2a and 2b. In 
Fig. 2c a possible winning strategy for the verifier is given. Each abstract state contains 
the value of the program counter of both copies (given as the pair at the top) and the 
predicates that hold in that state. For sake of readability we omit the trace variables 
and write, e.g., v1 for x,,. We mark the initial state with an incoming arrow. The 
outer label at each state gives the scheduling M C {1,2} chosen by the strategy in 
that state. 


directly corresponds to a reduction and accompanying inductive invariant for 
the safety property within the given abstraction. 

For our example, we give (parts of) a possible winning strategy in Fig. 2c. In 
each abstract state, the strategy chooses a scheduling (written next to the state), 
and all abstract states compatible with that scheduling are listed as successors. 
Note that whenever the program counter is (2,2) (i.e., both programs are at 
their output position), it holds that xı = x2 (as required). The example strategy 
schedules in lock-step for the most part (by choosing M = {1,2}) but lets P1 
take the inner loop twice, thereby maintaining the linear invariants 7; = x2 
and yı = 2y2. In particular, the resulting reduction is property-based [39] as 
the scheduling is based on the current (abstract) state. Note that the program 
cannot be verified with only linear invariants in a sequential or parallel (lock- 
step) reduction. 


2.2 Beyond k-Safety: Quantification as a Game 


We build upon this game-based interpretation of a reduction to move beyond 
k-safety. As a second example, consider the two programs Q1 and Q2 in Fig. 3, 
where x, denotes a nondeterministic choice of type T € {N, B}. We wish to check 
that Q1 refines Q2, i.e., all output behavior of Q1 is also possible in Q2. We can 
express this in our logic as follows: 


Vim : (pe = 2). Irs : (pe = 2). Olar, = ara) 


The property states that for every trace mı in Q1 there exists a trace m2 in 
Q2 that outputs the same value. The quantifiers range over infinite traces of 
variable assignments (with infinite domains), making a direct verification of the 
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1: repeat 

2: print(a) 1: repeat 

3: oO kN 2: print(a) 

4: while xp do 3: £ xN 

5: ger+l 4. yeu 

6 yor 5: while y > 0 do 
7: whiley>Odo 6 a-—a+x+xyn 
8 as a+r 7 y= y-l1 

9 yoy-1l 


(b) Program Q2 
(a) Program Q1 


(c) Winning strategy for the verifier. 


Fig. 3. Two programs Q1 and Q2 are given in Fig.3a and 3b. In Fig. 3c a possible 
winning strategy for the verifier is depicted. The outer label gives the scheduling M C 
{1,2} and, if applicable, the restriction chosen by the witness strategy. 


quantifier alternation challenging. In contrast to alternation-free formulas, we 
cannot reduce the verification to verification on a self composition [8,28]. Instead, 
we adopt (yet another) game-based interpretation by viewing the existentially 
quantified traces as being resolved by a strategy (called the witness strategy) 
[23]. That is, instead of trying to find a witness traces 72 in Q2 when given the 
entire trace 7, we interpret the V4 property as a game between verifier and 
refuter. The refuter moves through the state space of Q1 (thereby producing a 
trace 71), and the verifier reacts to each move by choosing a successor in the state 
space of Q2 (thereby producing a trace 72). If the verifier can assure that the 
resulting traces 7, 72 satisfy (az, = ar), the VA property holds. However, this 
game-based interpretation fails in many instances. There might exist a witness 
trace 72, but the trace cannot be produced by a witness strategy as it requires 
knowledge of future moves of the refuter. Let us discuss this on the example 
programs in Fig. 3. A simple (informal) solution to construct a witness trace 72 
(when given the entire 71) would be to guarantee that in Q2:4 (meaning location 
4 of Q2) and line Q1:6 the value of x in both programs agrees (i.e., 21 = £2 holds) 
and then simply resolve the nondeterminism at Q2:6 with 0. However, to follow 
this idea, the witness strategy for the verifier, when at Q2:3, would need to know 
the future value of xı when Q1 is at location Q1:6. 

Our insight in this paper is that we can turn the strategy-based interpretation 
of the witness trace 72 into a useful verification method by combining it with 
a program reduction. As we express both searches strategically, we can phrase 
the combined search as a combined game. In particular, both the reduction and 
the witness strategy are controlled by the verifier and can thus collaborate. In 
the resulting game, the verifier chooses a scheduling (as in Sect. 2.1) and, addi- 
tionally, whenever the existentially quantified copy is scheduled, the verifier also 
decides on the successor state of that copy. We depict a possible winning strat- 
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egy in Fig. 3c. This strategy formalizes the interplay of reduction and witness 
strategy. Initially, the verifier only schedules {1} until Q1 has reached program 
location Q1:6 (at which point the value of z is fixed). Only then does the verifier 
schedule {2}, at which point the witness strategy can decide on a successor state 
for Q2. In our case, the strategy chooses a value for x such that xı = 22 holds. 
As we work in an abstraction of the actual system, we formalize this by restrict- 
ing the abstract successor states. In particular, in state a7 the verifier schedules 
{2} and simultaneously restricts the successors to {ag} (i.e., the abstract state 
where zı = z2 holds), even though abstract state [(6,4),a1 = a2,21 Æ 29] is 
also a valid successors under scheduling {2}. We formalize when a restriction is 
valid in Sect.6. The resulting strategy is winning and therefore denotes both a 
reduction and witness strategy for the existentially quantified copy. Importantly, 
both reduction and witness strategy are mutually dependent. Our tool HyPA is 
able to verify both properties (in Fig. 2 and Fig. 3) in a matter of a few seconds 
(cf. Sect. 7). 


3 Preliminaries 


We begin by introducing basic preliminaries, including our basic model of com- 
putation and background on (finite-state) safety games. 


Symbolic Transition Systems. We assume some fixed underlying first-order the- 
ory. A symbolic transition system (STS) is a tuple T = (X, init, step) where X 
is a finite set of variables (possibly sorted), init is a formula over X describing 
all initial states, and step is a formula over X Ww X’ (where X’ = {x' | a € X} is 
the set of primed variables) describing the transitions of the system. A concrete 
state u in T is an assignment to the variables in X. We write yp’ for the assign- 
ment over X’ given by pi/(x’) := u(x). A trace in T is an infinite sequence of 
assignment popi: such that jug = init and for every i € N, p; ®© 4,1 E step. 
We write Traces(T) for the set of all traces in 7. We can naturally interpret 
programs as STS by making the program counter explicit. 


Formula Transformations. For the remainder of this paper, we fix the set of 
system variables X. We also fix a finite set of trace variables V = {7,..., Tk}. 
For a trace variable 7 € V we define Xr := {£r | £ E€ X} and write X for 
Xr U+ U Xrm. For a formula 6 over X, we define 6(,) as the formula over 
X, obtained by replacing every variable x with x,. Similarly, we define k fresh 
disjoint copies X’ = XZ, U U Xi, (where X} = {x} | x € X}). For a formula 
@ over X , we define @&) as the formula over X’ obtained by replacing every 
variable x, with a’. 


Safety Games. A safety game is a tuple G = (Ssare, Sreacu, So, T, B) where S = 
Ssarz W Sreacy is a set of game states, So C S a set of initial states, T C SxS 
a transition relation, and B C S a set of bad states. We assume that for every 
s € S there exists at least one s’ with (s, s’) € T. States in Sgarg are controlled by 
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player SAFE and those in Spracy by player REACH. A play is an infinite sequence of 
states sgs1,--- such that so € So, and (s;, 5:41) € T for every i € N. A positional 
strategy ø for player p E€ {SAFE,REACH} is a function ø : Sp — S such that 
(s,a(s)) € T for every s € Sp. A play sos ++: is compatible with strategy o for 
player p if s;41 = 0(s;) whenever s; € Sp. The safety player wins G if there is a 
strategy o for SAFE such that all o-compatible plays never visit a state in B. In 
particular, SAFE needs to win from all initial states. 


4 Observation-Based HyperLTL 


In this section, we present OHyperLTL (short for observation-based HyperLTL). 
Our logic builds upon HyperLTL [21], which itself extends linear-time temporal 
logic (LTL) with explicit trace quantification. In OHyperLTL, we include predi- 
cates from the background theory (to reason about infinite variable domains) and 
explicit observations (to express asynchronous properties). Formulas in OHyper- 
LTL are given by the following grammar:? 


pi=Vr:&.p|ir:é.y|¢ 
$ :=0 | =o | 1A b2|O06| 1U Q2 


Here m € V is a trace variable, 0 is a formula over X , and € is a formula over 
X (called the observation formula). For ease of notation, we assume that all 
variables in VY occur in the quantifier prefix exactly once. We use the standard 
Boolean connectives A, —, +, and constants T,1, as well as the derived LTL 
operators eventually O¢ := TU @, and globally Oo := ~O79¢. 


Semantics. A trace t is an infinite sequence pou- of assignments to X. For 
i € N, we write t(i) to denote the ith value in t. A trace assignment J is a partial 
mapping of trace variables in V to traces. Given a trace assignment JT andi € N, 
we define TI (i) to be the assignment to X given by II(i)(a) = II()(i)(a), i.e, 
the value of x, is the value of x on the trace assigned to m. For the LTL body 
of an OHyperLTL formula, we define: 


Lite iff H(i) KO 

itn if Mik 

Lit di Ado iff II,iE ¢, and I,i | ¢2 

H, iH Og if Mit+1le¢ 

HiH hU do if 3j >i. Wj H ġ2and Vi < k < j. I, k H gı 


The distinctive feature of OHyperLTL over HyperLTL are the explicit obser- 
vations. Given an observation formula € and trace t, we say that € is a valid 


? For the examples in Sect. 2, we additionally annotated quantifiers with an STS if we 
want to reason about different STSs within the same formula. In the following, we 
assume that all quantifiers range over traces in the same STS to simplify notation. 
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observation on t (written valid(t, €)) if there are infinitely many i € N such that 
t(i) H| £. If valid(t,€) holds, we write (t)¢ for the trace obtained by projecting 
on those positions i where t(i) = €, i.e., (t)e(z) = t(j) where j is the ith index 
that satisfies €. Given a set of traces T we resolve trace quantification as follows: 

I =r ġ if MO0E¢ 

II Er Vr: £. p iff Vte{teT ]|valid(t, €)}. Hir (tel Er vp 

I =r ar: E. p iff Ste {teT | valid(t,€)}. Hir (tel Er p 


The semantics mostly agrees with that of HyperLTL [21] but projects each trace 
to the positions where the observation holds. Given an STS 7 and OHyperLTL 
formula y, we write T E y if Ø Etraces(r) Y where Q is the empty assignment. 


The Power of Observations. The explicit observations in OHyperLTL facilitate 
the specification of asynchronous hyperproperties, i.e., properties where traces 
are traversed at different speeds. For the example in Sect. 2.1, the explicit obser- 
vations allow us to compare the output of both programs even though the actual 
step at which the output occurs (in a synchronous semantics) differs between 
both programs (as P1 takes the inner loop twice as often as P2). As the observa- 
tions are part of the specification, we can model a broad spectrum of properties 
ranging, e.g., from timing-insensitive properties (by placing observations only at 
output locations) to timing-sensitive specifications [29] (by placing observations 
at closer intervals). Functional (opposed to temporal) k-safety properties speci- 
fied by pre-and postcondition [10,39,41] can easily be encoded as V¥-OHyperLTL 
properties by placing observations at the start and end of each program. By set- 
ting € = T, i.e., observing every step, we can express synchronous properties. 
OHyperLTL thus subsumes HyperLTL. 


Finite-State Model Checking. Many mechanisms used to express asynchronous 
hyperproperties render finite-state model checking undecidable [9, 17,31]. In con- 
trast, the simple mechanism used in OHyperLTL maintains decidable finite-state 
model checking. Detailed proofs can be found in the full version [15]. 


Theorem 1. Assume an STS T with finite variable domains and decidable back- 
ground theory and an OHyperLTL formula y. It is decidable if T — ọ. 


Proof Sketch. Under the assumptions, we can view 7 as an explicit (instead of 
symbolic) finite-state transition system. Given an observation formula € we can 
effectively compute an explicit finite-state system 7” such that Traces(T’) = 
{(t)e | t € Traces(T) A valid(t, €)}. This reduces OHyperLTL model checking on 
T to HyperLTL model checking on T’, which is decidable [28]. 


Note that for infinite-state (symbolic) systems, we cannot effectively compute 
T’ as in the proof of Theorem 1. In fact, there may not even exist a system 7” 
with the desired property that is expressible in the same background theory. 

The finite-state result in Theorem 1 is of little relevance for the present 
paper. Nevertheless, it indicates that our logic is well suited for verification of 
infinite-state (software) systems as the (inevitable) undecidability stems from 
the infinite domains in software programs and not already from the logic itself. 
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Safety. In this paper, we assume that the hyperproperty is temporally safe [12], 
i.e., the temporal body of any OHyperLTL formula denotes a safety property. 
Note that, as we support quantifier alternation, we can still express hyperliveness 
properties [22,23]. For example, GNI is both temporally safe and hyperliveness. 
We model the body of a formula by a symbolic safety automaton [24], which is a 
tuple A = (Q, qo, ô, B) where Q is a finite set of states, go € Q the initial state, 
B C Q a set of bad-states, and 6 a finite set of automaton edges of the form 
(q,9,q') where q,q' € Q are states and @ is a formula over X. Given a trace t 
over assignments to X , a run of A on t is an infinite sequence of states qoq1 +- 
(starting in go) such that for every i, there exists an edge (qi, 0i, qi+1) € 6 such 
that t(i) } 6;. A word is accepted by A if it has no run that visits a state in B. 
The automaton is deterministic if for every q E€ Q and every assignments pu to 
X, there exists exactly one edge (q,6,q') € 6 with u = 8. 


5 Reductions as a Game 


After having defined our temporal logic, we turn our attention to the automatic 
verification of OHyperLTL formulas on STSs. In this section, we begin by for- 
malizing our game-based interpretation of a reduction. To illustrate this, we 
consider Y¥ OHyperLTL formulas, which, as the body of the formula is a safety 
property, always denote k-safety properties. 


Predicate Abstraction. Our search for a reduction is based in the scope of a 
fixed predicate abstraction [30,33], i.e., we abstract our system by keeping track 
of the truth value of a few selected predicates that (ideally) identify properties 
that are relevant to prove the property in question. Let T = (X, init, step) be 
an STS and let y = Vm : &...Va : Ek. Q be the (k-safety) OHyperLTL we 
wish to verify. Let Ag = (Q¢, 94,0, 54, By) be a deterministic safety automaton 
for @. A relational predicate p is a formula over X that identifies a property of 
the combined state space of k system copies. Let P = {pi,...,Pn} be a finite 
set of relational predicates. We say a formula over X is expressible in P if it is 
equivalent to a boolean combination of the predicates in P. We assume that all 
edge formulas in the automaton Ag, and formulas init,,,) and (€;)(,,) for m; € V 
are expressible in P. Note that we can always add missing predicates to P. 

Given the set of predicates P, the state-space of the abstraction w.r.t. P is 
given by B”, where for each abstract state § € B”, the ith position ŝļi] € B tracks 
whether or not predicate p; holds. To simplify notation, we write ite(b, 0,6’) to 
be formula 0 if b = T, and 6’ otherwise. For each abstract state 5 € B”, we 
define [ê] = Aj, ite (8[#], pi, pi), i.e., [8] is a formula over X that captures 
all concrete states that are abstracted to §. To incorporate reductions in our 
abstraction, we parametrize the abstract transition relation by a scheduling M C 
{m1,..., Tk}. We lift the step formula from T by defining 


k 


step y = \ ite (r: € M, stePin,)> \ Lh, = Ba 
i=1 zEX 
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That is all copies in M take a step while all other copies remain unchanged. 
Given two abstract states §1, 2 we say that S2 is an M-successor of 81, written 
3; 25 S3, if [$1] A [82] A step m is satisfiable, i.e., we can transition from ê; to 
82 by only progressing the copies in M. 

For an abstract state 8, we define obs(8) € B* as the boolean vector that 
indicates which copy (of 7,...,7) is currently at an observation point, i.e., 
obs(8)[i] = T iff [8] (£i) ¢n,) is sabishable: Note that as (&)(7,) is, by Ssaumpuion, 
expressible in P, either all or none of the concrete states in [8] satisfy (&) (7). 


Game Construction. Building on the parametrized abstract transition relation, 
we can construct a (finite-state) safety game where winning strategies for the 
verifier correspond to valid reductions with accompanying proofs. The nodes in 
our game have two forms: Either they are of the form (8,q,b) where § € B” is 
an abstract state, q E Qs a state of the safety automaton, and b € k a boolean 
vector indicating which copy has moved since the last automaton step; Or of the 
form (8,q,b, M) where 5, q, and b are as before and @ # M C{m,..., Tk} isa 
scheduling. The initial states are all states (§,¢4,0, T”) where [8] A Naa initin) 
is satisfiable (recall that init,,,) is expressible in P). We mark a state (8, q, b) or 
(8, q, b, M) as losing iff q € By. For automaton state q E Qg and abstract state 8, 
we define 64(q, 8) as the unique state q’ such that there is an edge (q, 0, q') € 6¢ 
such that [8] \@ is satisfiable. Uniqueness follows from the assumption that Ag is 
deterministic and all edge formulas are expressible in P. The transition relation 
of our game is given by the following rules: 


Yr; € M. bi] V >0bs(8) li], obs(ê) =T* q' = 54(q, 8) 
(ê,q,b) ~ (8,q, b, M) (8,q, T*) ~ (8,q', L*) 


(2) 


a4 v= bli > Tla,em 
(8,q,b, M) ~> (8,4, b') 


In rule (1), we select any scheduling that schedules only copies that have not 
reached an observation point or have not moved since the last automaton step. 
In particular, we cannot schedule any copy that has moved and already reached 
an observation point. In rule (2), all copies reached an observation point and 
have moved since the last update (i.e., b = T*) so we progress the automaton 
and reset b. Lastly, in rule (3), we select an M-successor of § and update b for 
all copies that take part in the step. In our game, player SAFE takes the role 
of the verifier, and player REACH that of the refuter. It is the safety player’s 
responsibility to select a scheduling in each step, so we assign nodes of the form 
(8,q,b) to SAFE. Nodes of the form (8, q,b, M) are controlled by REACH who can 
choose an abstract M-successor. Let Gir, 7 P) be the resulting (finite-state) safety 


(3) 


game. A winning strategy for SAFE in GY. (ToP) picks, in each abstract state, a 
valid scheduling that prevents a visit to i losing state. We can thus show: 


Theorem 2. If player SAFE wins Gr oP) then T E y. 
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Proof Sketch. Assume o is a winning strategy for SAFE in Gir pP) Let 
t1,--., tk E€ Traces(T) be arbitrary. We, iteratively, construct stuttered ver- 


sions t),...,t), of t1,...,t, by querying o on abstracted prefixes of t),..., tx: 
Whenever o schedules copy i we take a proper step on ti; otherwise we stut- 
ter. By construction of Ger, p,p) the stuttered traces ti,...,t), align at obser- 
vation points. In particular, we have [7m +> (ti)e,,..-,7%  (tee,] = Q iff 
[mi = (ti)ea,---, Tk (HDe,] = o. Moreover, the sequence of abstract states in 
Gir yp) forms an abstraction of t3, .. . , t and shows that Ag cannot reach a bad 
state when reading (é{)c,,.--, (te, (as o is winning). This already shows that 
[mi + (fides +--+ te O (i de,] Æ o and thus [71 > (tipe,,---, 7% — (tede,] = Q- 


As this holds for all traces t1,...,t, € Traces(T), we get T | ọ as required. 


Game Construction and Complexity. If the background theory is decidable, 
Gir p,p) Can be constructed effectively using at most 2|PIt1 . 2F queries to an 


SMT solver. Checking if SAFE wins Ger, p,p) Can be done with a simple fixpoint 
computation of the attractor in linear time. 

Our game-based method of finding a reduction in a given abstraction is 
closely related to the notation of a property-directed self-composition [39]. The 
previously only known algorithm for finding such a reduction is based on an opti- 
mized enumeration [39], which, in the worst case, requires O(2!?!+1 . 2k) many 
enumerations. Our worst-case complexity thus matches the bounds inferred by 
[39], but avoids the explicit enumeration of reductions (and the concomitant 
repeated construction of the abstract state-space) and is, as we believe, concep- 
tually simpler to comprehend. Moreover, our game-based technique is the key 
stepping stone for extending our method beyond k-safety in Sect. 6. 


6 Verification Beyond k-Safety 


Building on the game-based interpretation of a reduction, we extend our ver- 
ification beyond V* properties to support V*d* properties. We accomplish this 
by combining the game-based reading of a reduction (as discussed in the pre- 
vious section) with a game-based reading of existential quantification. For the 
remainder of this section, fix an STS T = (X, init, step) and let 


p = Yr : 1... YT : &- dig: Erpa -o dtp Ek O 


be the OHyperLTL formula we wish to check, i.e., we universally quantify over 
l traces followed by an existential quantification over k — l traces. We assume 
that for every existential quantification Jr; : €; occurring in y, valid (t, €;) holds 
for every t € Traces(T) (we discuss this later in Remark 1). 


6.1 Existential Trace Quantification as a Game 


* 


The idea of a game-based verification of V*J* properties is to consider a Y*3J*- 
property as a game between verifier and refuter [23]. The refuter controls the l 
universally quantified traces by moving through l copies of the system (thereby 
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producing traces 7,...,7;) and the verifier reacts by, incrementally, moving 
through k —1 copies of the system (thereby producing traces 77141,..., Tp). If the 
verifier has a strategy that ensures that the resulting traces satisfy ¢, T = ẹ 
holds. We call such a strategy for the verifier a witness strategy. 

We combine this game-based reading of existential quantification with our 
game-based interpretation of a reduction by, additionally, letting the verifier con- 
trol the scheduling of the system. When played on the concrete state-space of 
T the game proceeds in three stages as follows: 1) The verifier selects a valid 
scheduling M C {m,..., 7}; 2) The refuter selects successor states for all uni- 
versally quantified copies by fixing an assignment to X7,,...,X/, (only moving 
copies scheduled by M); 3) The verifier reacts by choosing successor states for the 
existentially quantified copies by fixing an assignment to X7,,,,...,X7, (again, 
only moving copies scheduled by M). Afterward, the process repeats. 

As we work within a fixed abstraction of 7, the verifier can, however, not 
choose concrete successor states directly but only work in the precision captured 
by the abstraction. Following the general scheme of abstract games, we, therefore, 
underapproximate the moves available to the verifier [2]. Formally, we abstract 
the three-stage game outlined before (which was played at the level of concrete 
states) to a simpler abstract game (consisting of only two stages). In the first 
stage, the verifier selects both a scheduling M and a restriction on the set of 
abstract successor states, i.e., a set of abstract states A. In the second stage, 
the refuter cannot choose any abstract successor state (any M-successor in the 
terminology from Sect.5), but only successors contained in the restriction A. 
To guarantee the soundness of this approach, we ensure that the verifier can 
only pick restrictions that are valid, i.e., restrictions that underapproximate the 
possibilities of the verifier on the level of concrete states. 


Game Construction. We modify our game from Sect.5 as follows. States are 
either of the form (8,q,b) (as in Sect.5) or of the form (8,q,b, M, A) where 5, 
q, b, and M are as in Sect.5, and A C B” is a subset of abstract states (the 
restriction). To reflect the restriction, we modify transition rules (1) and (3). 
Rule (2) remains unchanged. 


Yr; € M.-b[i] V 40bs(8)[i] —_ validRes*;™ i S EA b= blim Thiem 

(8,q,b) ~ (8,q,b, M, A) (5,q,b, M, A) S (5',q,b') 
In rule (1), the safety player (who, again, takes the role of the verifier) selects 
both a scheduling M and a restriction A such that validRes$™” holds (which we 


define later). The reachability player (who takes the role of the refuter) can, in 
rule (3), select any successor contained in A. 


(3) 


Valid Restriction. The above game construction depends on the definition of 
validRes$” . Intuitively, A is a valid restriction if it underapproximates the pos- 
sibilities of a witness strategy that can pick concrete successor states for all 
existentially quantified traces. That is, for every concrete state in 8, a witness 
strategy (on the level of concrete states) can guarantee a move to a concrete state 
that is abstracted to an abstract state within A. Formally we define validRes*;" 
as follows: 
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Vix fe vV{X7, h= [8] A AN ite (r: E€ M, step (, arly = — 
k i=l cEX 
> HXi Eui \ ite (r: € M, step x, rer v= on VV [sy 
i=l+1 zex SEA 


It expresses that for all concrete states in [5] (assignments to {X,,, }*_,) and for 
all concrete successor states for the universally quantified copies (assignments 
to {X/ y), there exist successor states for the existentially quantified copies 
4X} pia l ae such that one of the abstract states in A is reached. 


Example 1. With this definition at hand, we can validate the restrictions cho- 
sen by the strategy in Fig.3c. For example, in state a7 the strategy sched- 
ules M = {2} and restricts the successor states to {ag} even though abstract 
state [(6,4),a, = a2,2%1 # £9] is also a {2}-successor of az. If we spell out 


validRes sg } we get 


VX1UX2UX}. as aa \( VAN z= a) => JX}. ab = a2 A Yh = Y2 ^ (ai =a AT] = xh) 
— <naam 
lo7] ZER step(2) v) 
[as] 
where X = {a, x, y}. Here we assume that step := (a’ = a^y' = y) is the update 
performed on instruction x — xy from Q2:3 to Q2:4. The above formula is valid. 


Correctness. Call the resulting game GE o P) The game combines the search for 


a reduction with that of a witness strategy (both within the precision captured 
by P).° We can show: 


Theorem 3. If player SAFE wins G, PY then T = yp. 


Proof Sketch. Let o be a winning strategy for SAFE in GG wp): Let t1,...,t) € 
Traces(T) be arbitrary. We use ø to incrementally construct witness traces 
ti+1;- -tk by querying o. In every abstract state £, ø selects a scheduling M and 
a restriction A such that validRes*;" holds. We plug the current concrete state 
(reached in our construction of tj41,...,t,) into the universal quantification of 
validRes*; a M and get (concrete) witnesses for the existential quantification that, 


by definition of validRes a are valid successors for the existentially quantified 
copies in 7. 


Remark 1. Recall that we assume that for every existential quantification Jr; : €; 
occurring in y and all t € Traces(T), valid(t,€;) holds. This is important to 
ensure that the safety player (the verifier) cannot avoid observation points 
forever. We could drop this assumption by strengthening the winning condi- 
tion in G7, 2P) and explicitly state that, in order to win, SAFE needs to visit 
observations points on existentially quantified traces infinitely many times. 


3 In particular, GF oP) (strictly) generalizes the construction of GT oP) from Sect. 5: 
If k = l (i.e., the property is a V*-property) the unique minimal valid restriction from 


&, M is {8| & 1, 8'}, i.e., the set of all M-successors of §. The safety player can 
thus not be more restrictive than allowing all M-successors (as in Gir w,P))- 
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Clairvoyance vs. Abstraction. The cooperation between reduction (the ability 
of the verifier to select schedulings) and witness strategy (the ability to select 
restrictions on the successor) can be seen as a limited form of prophecy [1,14]. 
By first scheduling the universal copies, the witness strategy can peek at future 
moves before committing to a successor state, as we e.g., saw in Fig.3. The 
“theoretically optimal” reduction is thus a sequential one that first schedules 
only the universally quantified traces (until an observation point is reached) 
and thereby provides maximal information for the witness strategy. However, 
in the context of a fixed abstraction, this reduction is not always optimal. For 
example, in Fig.3 the strategy schedules the loop in lock-step which is crucial 
for generating a proof with simple (linear) invariants. In particular, Fig. 3 does 
not admit a witness strategy in the lock-step reduction and does not admit a 
proof with linear invariants in a sequential reduction. Our verification framework, 
therefore, strikes a delicate balance between clairvoyance needed by the witness 
strategy and precision captured in the abstraction, further emphasizing why the 
searches for reduction and witness strategy need to be mutually dependent. 


; ; va 
6.2 Constructing and Solving GTP) 


foo e the game graph = — sss 
of GY2 (Fw requires the identi- Algorithm 1. Iterative solver for GG, pP) 


fication. s all valid restrictions 1: Input: T, p, P 


(of which there are exponen- 2: Gx initialApproximation(T , p, P) 
tially many in the number of 3: repeat 


abstract states and thus double 4: match Solve(G) with 
exponentially many in the num- 5: case REACH(c): return REACH 
ber of predicates) each of which 6: case SAFE(c): 
requires to solve a quantified T: for all (5, M, ae Restrictions(a) do 
SMT query. We propose a more ®& if svalidRes,”” then 
effective algorithm that solves 3: foral SS Ge 
10: G = Remove(G, (8, M, A’)) 


GF yp oP without constructing Ji: goto: A 

it explicitly. Instead, we itera- yo. return SAFE 
aly refine an abstraction G of 
epee (T oP) Our method hinges on the following easy observation: 


Lemma 1. For any 8 and M, {A | validRes*;™ } is upwards closed (w.r.t. C). 


Our initial abstraction consists of all possible restrictions (even those that 
might De a i.e., we allow all restrictions of the form (8, M, A) where A C 
{8 | 8 x, ŝ'}.* This overapproximates the power of the safety player, i.e., a 
winning strategy for SAFE in G may not be valid in Ce pP) To remedy this, we 
propose the following inner refinement loop: If we find a winning strategy o for 


4 Note that {3 | 8 Hry } is always a valid restriction. Importantly, we can compute 


{8 | 8 x 8’} locally, i.e., by iterating over abstract states opposed to sets of abstract 
states. 
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SAFE in G we check if all restrictions chosen by ø are valid. If this is the case, øo 
is also winning for GG, 2,P) and we can apply Theorem 3. If we find an invalid 


restriction (8, M, A) used by ø, we refine G by removing not only the restriction 
(8, M, A) but all (8, M, A’) with A’ C A (which is justified by Lemma 1). The 
algorithm is sketched in Algorithm 1. The subroutine Restrictions(c) returns all 
restrictions used by øg, i.e., all tuples (8, M, A) such that ø uses an edge (8, q, b) ~> 
(8,q,b, M, A) for some q, b. Remove(G, (8, M, A’)) removes from G all edges of the 
form (8,q, 6) ~> (8,q,b, M, A’) for some q, b, and Solve solves a finite-state safety 
game. To improve the algorithm further, in line 4 we always compute a maximal 
safety strategy, i.e., a strategy that selects maximal restrictions (w.r.t. C) and 
therefore allows us to eliminate many invalid restrictions from G simultaneously. 
For safety games, there always exists such a maximal winning strategy (see 
e.g. [11]). Note that while G is large, solving this finite-state game can be done 
very efficiently. The running time of solving GE oP) is dominated by the SMT 
queries of which our refinement loop, in practice, requires very few. 


7 Implementation and Evaluation 


When combining Theorem 3 and Table 1. Evaluation of HyPA on k- 
our iterative solver from Sect.6.2 safety instances. We give the size of 
we obtain an algorithm to verify the abstract game-space (Size), the time 
y*J*-safety properties within a given taken to compute the abstraction (tabs), 
abstraction. We have implemented a and the overall time taken by HyPA (t). 
prototype of our method in a tool we Times are given in seconds. 

call HyPA. We use Z3 [36] to discharge 
SMT queries. The input of our toolis Instance Size tabs t 
provided as an arbitrary STS in the ~~~ 
SMTLIB format [5], making it lan- DoubleSquareNI 819 92.3 92.8 


guage independent. In our programs, HalfSquareNI 1166 85.9 86.5 
we make the program counter explicit, “Qua ae 
allowing us to track predicates locally Re ieee Aosta a ae 
[32]. Array Insert 213 28.2 28.2 
Evaluation for k-Safety. As a special _Expix3 Dh 112 as 4 5 oe 4.5 


case of V*J* properties, HyPA is also Fig3 268 11.9 12.0 


applicable to k-safety verification. We DoubleSquareNIff 121 98 9.9 
collected an exemplifying suite of pro- ..s..s..sss+s+tetsnsrrrsrttrtsttntsrttnrrntnenr eee 
grams and k-safety properties from Fig. 2 (rar ee nee 333 23.7 23.8 
the literature [27,39-41] and manu- Collitem-Symm 494 24.0 24.1 
ally translated them into STS (this agysgnuyyunnnuun ygn uag 
can be automated easily). The results ...0.0..0 seein eetere eet ra 
are given in Table 1. As done by She- MultEquiv 757 18.9 19.0 
mer et al. [39], we already provide a 

set of predicates that is sufficient for some reduction (but not necessarily the 
lockstep or sequential one), the search for which is then automated by HyPA. 
Our results show the game-based search for a reduction can verify interesting 
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Table 2. Evaluation of HyPA on V*i*-safety verification instances. We give the size and 
construction time of the initial abstraction (Size and taps). For both the direct (explicit) 
and lazy (Algorithm 1) solver we give the time to construct (and solve) the game (tsowwe) 
and the overall time (t = tabs + tsowe). For the lazy solver we, additionally, give the 
number of refinement iterations (#Ref). Times are given in seconds. TO indicates a 
timeout after 5 min. 


Direct Lazy 
Instance Size tabs Toive t #Ref towe t 
NonDetAdd 4568 3.5 TO TO 4 1.0 4.5 
 CounterSum 479 53 91 l4 17 09 62. 
“AsynchGNI 437 61 69 130 1 O01 62 
~CompilerOptl 354 24 23 47 °° | 2 02 26 
~CompilerOpt2 338 28 24 52 | 2 02 30 
‘Refine «1357 61 TO TO 4 07 68 
Refine? 1476 56 TO TO 5 06 62 
Smaller 327 23 40 63 ll 04 27 
 CounterDiff 959 8.5 18.3 268 19 11) (96) 
Fig3 00 3180 111 TO TO 22 29 140 
P1 (simple) 83 20 14.34 1 Ol 24 
PI(GND  : 34793 17.0 TO TO  ~ T2 95.7 112.7. 
P2 (GN) 15753 10.2 TO TO 7 | 5.1 15.3 
CP3 (GN) 1429 6.6 20.9 275 7 06 72 
PA (GN) 7505 16.5 TO TO — T2 13.2 907. 


k-safety properties from the literature. We also note that, currently, the vast 
majority of time is spent on the construction of the abstract system. If we would 
move to a fixed language, the computation time of the initial abstraction could 
be reduced by using existing (heavily optimized) abstraction tools [18,32]. 


Evaluation Beyond k-Safety. The main novelty of HyPA lies in its ability to, for 
the first time, verify temporal properties beyond k-safety. As none of the existing 
tools can verify such properties, we compiled a collection of very small exam- 
ple programs and Y*J*-safety properties. Additionally, we modified the boolean 
programs from [13] (where they checked GNI on boolean programs) by includ- 
ing data from infinite domains. The properties we checked range from refine- 
ment properties for compiler optimizations, over general refinement of nonde- 
terministic programs, to generalized non-interference. Verification often requires 
a non-trivial combination of reduction and witness strategy (as the reduction 
must, e.g., compensate for branches of different lengths). As before, we provide 
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a set of predicates and let HyPA automatically search for a witness strategy with 
accompanying reduction. We list the results in Table2. To highlight the effec- 
tiveness of our inner refinement loop, we apply both a direct (explicit) construc- 
tion of GFP) and the lazy (iterative) solver in Algorithm 1. Our lazy solver 
(Algorithm 1) clearly outperforms an explicit construction and is often the only 
method to solve the game in reasonable time. In particular, we require very few 
refinement iterations and therefore also few expensive SMT queries. Unsurpris- 
ingly, the problem of verifying properties beyond k-safety becomes much more 
challenging (compared to k-safety verification) as it involves the synthesis of a 
witness function which is already 2EXPTIME-hard for finite-state systems [23,37]. 
We emphasize that no other existing tool can verify any of the benchmarks. 


8 Related Work 


Asynchronous Hyperproperties. Recently, many logics for the formal specification 
of asynchronous hyperproperties have been developed [9,13,17,31]. Our logic 
OHyperLTL is closely related to stuttering HyperLTL (HyperLTLg) [17]. In 
HyperLTLg each temporal operator is endowed with a set of temporal formulas 
T and steps where the truth values of all formulas in I remain unchanged are 
ignored during the operator’s evaluation. As for most mechanisms used to design 
asynchronous hyperlogics [9, 17,31], finite-state model checking of HyperLTLsg is 
undecidable. By contrast, in OHyperLTL, we always observe the trace at a fixed 
location, which is key for ensuring decidable finite-state model checking. 


k-Safety Verification. The literature on k-safety verification is rich. Many 
approaches verify k-safety by using a form of self-composition [8,20, 25,28] and 
often employ reductions to obtain compositions that are easier to verify. Our 
game-based interpretation of a reduction (Sect. 5) is related to Shemer et al. [39], 
who study k-safety verification within a given predicate abstraction using an 
enumeration-based solver (see Sect. 5 for a discussion). Farzan and Vandikas [27] 
present a counterexample-guided refinement loop that simultaneously searches 
for a reduction and a proof. Sousa and Dillig [40] facilitate reductions at the 
source-code level in program logic. 


V*A*- Verification. Barthe et al. [7] describe an asymmetric product of the sys- 
tem such that only a subset of the behavior of the second system is preserved, 
thereby allowing the verification of V*J* properties. Constructing an asymmetric 
product and verifying its correctness (i.e., showing that the product preserves 
all behavior of the first, universally quantified, system) is challenging. Unno 
et al. [41] present a constraint-based approach to verify functional (opposed to 
temporal) VF properties in infinite-state systems using an extension of constraint 
Horn clauses called pfwCHC. The underlying verification approach is orthogo- 
nal to ours: pfwCHC allows for a clean separation of the actual verification and 
verification conditions, whereas our approach combines both. For example, our 
method can prove the existence of a witness strategy without ever formulat- 
ing precise constraints on the strategy (which seems challenging). Coenen et 
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al. [23] introduce the game-based reading of existential quantification to ver- 
ify temporal V*3* properties in a synchronous and finite-state setting. By con- 
trast, our work constitutes the first verification method for temporal V*4*-safety 
properties in infinite-state systems. The key to our method is a careful inte- 
gration of reductions which is not possible in a synchronous setting. For finite- 
state systems (where the abstraction is precise) and synchronous specifications 
(where we observe every step), our method subsumes the one in [23]. Beut- 
ner and Finkbeiner [14] use prophecy variables to ensure that the game-based 
reading of existential quantification is complete in a finite-state setting. Auto- 
matically constructing prophecies for infinite-state systems is interesting future 
work. Pommellet and Touili [38] study the verification of HyperLTL in infinite- 
state systems arising from pushdown systems. By contrast, we study verification 
in infinite-state systems that arise from the infinite variables domains used in 
software. 


Game Solving. Our game-based interpretations are naturally related to infinite- 
state game solving [4,16, 26,42]. State-of-the-art solvers for infinite-state games 
unroll the game [26], use necessary subgoals to inductively split a game into 
subgames [4], encode the game as a constraint system [16], and iteratively refine 
the controllable predecessor operator [42]. We tried to encode our verification 
approach directly as an infinite-state linear-arithmetic game. However, existing 
solvers (which, notably, work without a user-provided set of predicates) could not 
solve the resulting game [4,26]. Our method for encoding the witness strategy 
using restrictions corresponds to hyper-must edges in general abstract games [2, 
3]. Our inner refinement loop for solving a game with hyper-must edges without 
explicitly identifying all edges (Algorithm 1) is thus also applicable in general 
abstract games. 


9 Conclusion 


In this work, we have presented the first verification method for temporal hyper- 
properties beyond k-safety in infinite-state systems arising in software. Our 
method is based on a game-based interpretation of reductions and existential 
quantification and allows for mutual dependence of both. Interesting future 
directions include the integration of our method in a counter-example guided 
refinement loop that automatically refines the abstraction and ways to lift the 
current restriction to temporally safe specifications. Moreover, it is interesting to 
study if, and to what extent, the numerous other methods developed for k-safety 
verification of infinite-state systems (apart from reductions) are applicable to the 
vast landscape of hyperproperties that lies beyond k-safety. 
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Abstract. Quantified information flow (QIF) has emerged as a rigor- 
ous approach to quantitatively measure confidentiality; the information- 
theoretic underpinning of QIF allows the end-users to link the computed 
quantities with the computational effort required on the part of the 
adversary to gain access to desired confidential information. In this work, 
we focus on the estimation of Shannon entropy for a given program II. 
As a first step, we focus on the case wherein a Boolean formula y(X, Y) 
captures the relationship between inputs X and output Y of IH. Such 
formulas y(X, Y) have the property that for every valuation to X, there 
exists exactly one valuation to Y such that is satisfied. The existing 
techniques require O(2”") model counting queries, where m = |Y |. 

We propose the first efficient algorithmic technique, called Entropy 
Estimation to estimate the Shannon entropy of p with PAC-style guar- 
antees, i.e., the computed estimate is guaranteed to lie within a (1 + €)- 
factor of the ground truth with confidence at least 1 — 6. Further- 
more, EntropyEstimation makes only O( PEG) ) counting and sam- 
pling queries, where m = |Y|, and n = |X|, thereby achieving a sig- 
nificant reduction in the number of model counting queries. We demon- 
strate the practical efficiency of our algorithmic framework via a detailed 
experimental evaluation. Our evaluation demonstrates that the proposed 
framework scales to the formulas beyond the reach of the previously 
known approaches. 


1 Introduction 


Over the past half-century, the cost effectiveness of digital services has led to 
an unprecedented adoption of technology in virtually all aspects of our modern 
lives. Such adoption has invariably led to sensitive information being stored in 
data centers around the world and increasingly complex software accessing the 
information in order to provide the services that form the backbone of our mod- 
ern economy and social interactions. At the same time, it is vital that protected 
information does not leak, as such leakages may have grave financial and societal 
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consequences. Consequently, the detection and prevention of information leakage 
in software have attracted sustained interest in the security community. 

The earliest efforts on information leakage focused on qualitative approaches 
that sought to return a Boolean output of the form “yes” or “no” [11,26,30]. 
While these qualitative approaches successfully capture situations where part of 
the code accesses prohibited information, such approaches are not well-suited 
to situations wherein some information leakage is inevitable. An oft-repeated 
example of such a situation is a password checker wherein every response “incor- 
rect password” does leak information about the secret password. As a result, 
the past decade has seen the rise of quantified information flow analysis (QIF) 
as a rigorous approach to quantitatively measure confidentiality [7,53,57]. The 
information-theoretic underpinnings of QIF analyses allow an end-user to link 
the computed quantities with the probability of an adversary successfully guess- 
ing a secret, or the worst-case computational effort required for the adversary 
to infer the underlying confidential information. Consequently, QIF has been 
applied in diverse use-cases such as software side-channel detection [40], inferring 
search-engine queries through auto-complete responses sizes [21], and measuring 
the tendency of Linux to leak TCP-session sequence numbers [59]. 

The standard recipe for using the QIF framework is to measure the informa- 
tion leakage from an underlying program J as follows. In a simplified model, 
a program JZ maps a set of controllable inputs (C) and secret inputs (J) to 
outputs (O) observable to an attacker. The attacker is interested in inferring 
I based on the output O. A diverse array of approaches have been proposed 
to efficiently model J, with techniques relying on a combination of symbolic 
analysis [48], static analysis [24], automata-based techniques [4,5,14], SMT- 
based techniques [47], and the like. For each, the core underlying technical 
problem is to determine the leakage of information for a given observation. 
We often capture this leakage using entropy-theoretic notions, such as Shan- 
non entropy [7,16,48,53] or min-entropy [7,44,48, 53]. In this work, we focus on 
computing Shannon entropy. 

In this work, we focus on entropy estimation for programs modeled by 
Boolean formulas; nevertheless, our techniques are general and can be extended 
to other models such as automata-based frameworks. Let a formula p(X, Y) cap- 
ture the relationship between X and Y such that for every valuation to X there 
is atmost one valuation to Y such that y is satisfied; one can view X as the set of 
inputs and Y as the set of outputs. Let m = |Y | and n = |X]. Let p be a probabil- 
ity distribution over {0,1}¥ such that for every assignment to Y, ø : Y + {0,1}, 
we have po = [solo where sol(y(Y + @)) denotes the set of solutions of 
(Y + a). Then, the entropy of is defined as H,(Y) = J` po log T 


o 
The past decade has witnessed a multitude of entropy estimation techniques 
with varying guarantees on the quality of their estimates [9, 17,35,58]. The prob- 
lem of computing the entropy of a distribution represented by a given circuit is 
closely related to the ENTROPYDIFFERENCE problem considered by Goldreich 
and Vadhan [34], and shown to be SZK-complete. We therefore do not expect to 
obtain polynomial-time algorithms for this problem. The techniques that have 
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been proposed to compute H(y) exactly compute po for each o. Observe that 
computing po is equivalent to the problem of model counting, which seeks to 
compute the number of solutions of a given formula. Therefore, the exact tech- 
niques require O(2) model-counting queries [13,27,39]; therefore, such tech- 
niques often do not scale for large values of m. Accordingly, the state of the 
art often relies on sampling-based techniques that perform well in practice but 
can only provide lower or upper bounds on the entropy [37,49]. As is often the 
case, techniques that only guarantee lower or upper bounds can output estimates 
that can be arbitrarily far from the ground truth. This raises the question: can 
we design efficient techniques for approximate estimation, whose estimates have 
PAC-style (€, ô) guarantees? I.e., can we compute an estimate that is guaranteed 
to lie within a (1 + €)-factor of the ground truth for all possible values, with 
confidence at least 1 — 6? 

The primary contribution of our work is the first efficient algorithmic tech- 
nique (given in our algorithm EntropyEstimation), to estimate H,,(Y) with PAC- 
style guarantees for all possible values of H,(Y). In particular, given a for- 
mula y, EntropyEstimation returns an estimate that is guaranteed to lie within a 
(1+ )-factor of H,(Y) with confidence at least 1 — 6. We stress that we obtain 
such a multiplicative estimate even when H,,(Y) is very small, as in the case of 
a password-checker as described above. Furthermore, EntropyEstimation makes 
only O( is) counting and sampling queries even though the support of the 
distribution specified by y can be of the size O(2"). 

While the primary focus of the work is theoretical, we seek to demonstrate 
that our techniques can be translated into practically efficient algorithms. As 
such, we focused on developing a prototype using off-the-shelf samplers and coun- 
ters. As a first step, we use GANAK [52] for model counting queries and SPUR [3] 
for sampling queries. Our empirical analysis demonstrates that EntropyEstimation 
can be translated into practice and achieves significant speedup over baseline. 

It is worth mentioning that recent approaches in quantified information leak- 
age focus on programs that can be naturally translated to string and SMT 
constraints, and therefore, employ model counters for string and SMT con- 
straints. Since counting and sampling are closely related, we hope the algorith- 
mic improvements attained by EntropyEstimation will lead to the development of 
samplers in the context of SMT and string constraints, and would lead to prac- 
tical implementation of EntropyEstimation for other domains. We stress again 
that while we present EntropyEstimation for programs modeled as a Boolean for- 
mula, our analysis applies other approaches, such as automata-based approaches, 
modulo access to the appropriate sampling and counting oracles. 

The rest of the paper is organized as follows: we present the notations and pre- 
liminaries in Sect. 2. We then discuss related work in Sect. 3. Next, we present an 
overview of EntropyEstimation including a detailed description of the algorithm 
and an analysis of its correctness in Sect. 4. We then describe our experimental 
methodology and discuss our results with respect to the accuracy and scalability 
of EntropyEstimation in Sect. 5. Finally, we conclude in Sect. 6. 
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2 Preliminaries 


We use lower case letters (with subscripts) to denote propositional variables and 
upper case letters to denote a subset of variables. The formula JY y(X,Y) is 
existentially quantified in Y, where X = {a1,---,a,} and Y = {y1, > , Ym}. 
For notational clarity, we use y to refer to p(X, Y) when clear from the context. 
We denote Vars(y) as the set of variables appearing in y(X,Y). A literal is a 
boolean variable or its negation. 

A satisfying assignment or solution of a formula y is a mapping T 
Vars(y) — {0,1}, on which the formula evaluates to True. For V C Vars(y), 
T\v represents the truth values of variables in V in a satisfying assignment T of 
p. We denote the set of all the solutions of y as sol(y). For S C Vars(y), we 
define sol(y)\5 as the set of solutions of y projected on S. 

The problem of model counting is to compute |sol(y)| for a given formula 
p. Projected model counting is defined analogously using sol(y),s instead of 
sol(~), for a given projection set! S C Vars(y). A uniform sampler outputs a 
solution y € sol(y) such that Pr[y is output] = walt" 

We say that ọ is a circuit formula if for all assignments 71,72 E€ sol(y), we 
have Tix = Ta;x = > Tı = T. It is worth remarking that if y is a circuit 
formula, then X is an independent support. 

For a circuit formula y(X,Y) and for o : Y — {0,1}, we define pọ = 


et nets Given a circuit formula (X,Y), we define the entropy of 9, 


denoted by H,(Y) as follows: H,(Y) = — } coy Po log(po). 


3 Related Work 


The Shannon entropy is a fundamental concept in information theory, and as 
such have been studied by theoreticians and practitioners alike. While this is the 
first work, to the best of our knowledge, that provides Probabilistic Approx- 
imately Correct (PAC) (¢,6)-approximation guarantees for all values of the 
entropy, while requiring only logarithmically (in the size of the support of dis- 
tribution) many queries, we survey below prior work relevant to ours. 
Goldreich and Vadhan [34] showed that the problem of estimating the entropy 
for circuit formulas is complete for statistical zero-knowledge. Estimation of the 
entropy via collision probabilities has been considered in the statistical physics 
community, but these techniques only provide lower bounds [43,55]. Batu et al. 
[9] considered entropy estimation in a black-box model wherein one is allowed 
to sample ø € 2° with probability proportional to pọ and po is revealed along 
with the sample ø. Batu et al. showed that any algorithm that can estimate the 
entropy within a factor of 2 in this model must use N(2™/8) samples. Further- 
more, Batu et al. proposed a multiplicative approximation scheme assuming a 
lower bound on H—precisely, it required a number of samples that grow lin- 
early with 1/H; their scheme also gives rise to an additive approximate scheme. 


1 Projection set has been referred to as sampling set in prior work [19,54]. 
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Guha et al. [35] improved Batu et al.’s scheme to obtain (e, ð) multiplicative 
estimates using o e samples, matching Batu et al.’s lower bound. Note 
that this grows with 1/H. 

A more restrictive model has been considered wherein we only get access 
to samples (with the assurance that every ø is sampled with probability pro- 
portional to p,). Valiant and Valiant [58] obtained an asymptotically optimal 
algorithm in this setting, which requires o2) samples to obtain an € additive 
approximation. Chakraborty et al. [17] considered the problem in a different set- 
ting, in which the algorithm is given the ability to sample o from a conditional 
distribution: the algorithm is permitted to specify a set S, and obtains ø from 
the distribution conditioned on o € S. We remark that as discussed below, our 
approach makes use of such conditional samples, by sampling from a modified 
formula that conjoins the circuit formula to a formula for membership in S. In 
any case, Chakraborty et al. use Olm" log 1) conditional samples to approxi- 
mately learn the distribution, and can only provide an additive approximation 
of entropy. A helpful survey of all of these different models and algorithms was 
recently given by Canonne [15]. 

In this paper, we rely on the advances in model counting. Theoretical inves- 
tigations into model counting were initiated by Valiant in his seminal work that 
defined the complexity class #P and showed that the problem of model counting 
is #P-complete. From a practical perspective, the earliest work on model count- 
ing [12] focused on improving enumeration-based strategies via partial solutions. 
Subsequently, Bayardo and Pehoushek [10] observed that if a formula can be 
partitioned into subsets of clauses, also called components, such that each of 
the subsets is over disjoint sets of variables, then the model count of the for- 
mula is the product of the model counts of each of the components. Building on 
Bayardo and Pehoushek’s scheme, Sang et al. [50] showed how conflict-driven 
clause learning can be combined with component caching, which has been fur- 
ther improved by Thurley [56] and Sharma et al. [52]. Another line of work 
focuses on compilation-based techniques, wherein the core approach is to com- 
pile the input formula into a subset £ in negation normal form, so that counting 
is tractable for £. The past five years have witnessed a surge of interest in the 
design of projected model counters [6,18,20,42,45,52]. In this paper, we employ 
GANAK [52], the state of the art projected model counter; an entry based on 
GANAK won the projected model counting track at the 2020 model counting 
competition [31]. 

Another crucial ingredient for our technique is access to an efficient sampler. 
Counting and sampling are closely related problems, and therefore, the devel- 
opment of efficient counters spurred the research on the development of sam- 
plers. In a remarkable result, Huang and Darwiche [36] showed that the traces 
of model counters are in d-DNNF (deterministic Decomposable Negation Nor- 
mal Form [25]), which was observed to support sampling in polynomial time [51]. 
Achlioptas, Hammoudeh, and Theodoropoulos [3] observed that one can improve 
the space efficiency by performing an on-the-fly traversal of the underlying trace 
of a model counter such as SharpSAT [56]. 
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Our work builds on a long line of work in the QIF community that identified 
a close relationship between quantified information flow and model counting [4, 
5, 27,33,38,59]. There are also many symbolic execution based approaches for 
QIF based on model counting that would require model counting calls that are 
linear in the size of observable domain, that is, exponential in the number of bits 
represents the domain [8,46]. Another closely related line of the work concerns 
the use of model counting in side-channel analysis [28,29,33]. Similarly, there 
exists sampling based approaches for black-box leakage estimation that either 
require too many samples, much larger than the product of size of input and 
output domain [23] to converge or uses ML based approaches that predict the 
error of the idea classifier for predicting secrets given observable [22]. However, 
these approaches can not provide PAC guarantees on the estimation. While we 
focus on the case where the behavior of a program can be modeled with a Boolean 
formula y, the underlying technique is general and can extended to cases where 
programs (and their abstractions) are modeled using automata [4,5,14]. 

Before concluding our discussion of prior work, we remark that Kopf and 
Rybalchenko [41] used Batu et al.’s [9] lower bounds to conclude that their 
scheme could not be improved without usage of structural properties of the 
program. In this context, our paper continues the direction alluded by Kopf and 
Rybalchenko and designs the first efficient multiplicative approximation scheme 
by utilizing white-box access to the program. 


4  EntropyEstimation: Efficient Estimation of H (ẹ) 


In this section, we focus on the primary technical contribution of our work: an 
algorithm, called EntropyEstimation, that takes a circuit formula y(X,Y) and 
returns an (£, ô) estimate of H(p). We first provide a detailed technical overview 
of the design of EntropyEstimation in Sect. 4.1, then provide a detailed description 
of the algorithm, and finally, provide the accompanying technical analysis of the 
correctness and complexity of EntropyEstimation. 


4.1 Technical Overview 


At a high level, EntropyEstimation uses a median of means estimator, i.e., we first 
estimate H(y) to within a (1te)-factor with probability at least 2 by computing 
the mean of the underlying estimator and then take the median of many such 
estimates to boost the probability of correctness to 1 — 6. 

Let us consider a random variable S over the domain sol(y);y such that 
Pr[S = o] = po wherein o € sol(y),y and consider the self-information function 
g : sol(y);y — [0,00), given by g(a) = log(;-). Observe that the entropy 
H(p) = Elg(S)]. Therefore, a simple estimator would be to sample S using 
our oracle and then estimate the expectation of g(S) by a sample mean. At 
this point, we observe that given access to a uniform sampler, UnifSample, we 
can simply first sample 7 € sol(p) uniformly at random, and then set S = 
Ty, which gives Pr[S = qy] = pry. Furthermore, observe that g(a) can be 
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computed via a query to a model counter. In their seminal work, Batu et al. [9] 
observed that the variance of g(S), denoted by variance[g(.S)], can be at most 
m?. The required number of sample queries, based on a straightforward analysis, 
would be @ (aieS)]) o ees = me). However, Elg(S)] = H(y) can 
be arbitrarily close to 0, and therefore, this does not provide a reasonable upper 
bound on the required number of samples. 

To address the lack of lower bound on H(y), we observe that for yp to have 
H(p) < 1, there must exist cnign € sol(y)jy such that Dionin) > 5- We then 
observe that given access to a sampler and counter, we can identify such a 
Shigh With high probability, thereby allowing us to consider the two cases sep- 
arately: (A) H(p) > 1 and (B) H(y) < 1. Now, for case (A), we could use 


Batu et al.’s bound for variance[g(S)] [9] and obtain an estimator that would 


require O (Sees) sampling and counting queries. It is worth remarking 


that the bound variance[g(S)| < m? is indeed tight as a uniform distribution 


over sol(p)ıx would achieve the bound. Therefore, we instead focus on the 
variance[g(S 


expression oe and Li that for the case when E|g(S)] = H(y) > h, 
ENEH S)] by Cto( Dm 
TO 
from m? to m (Observe that we have Ho ) > 1, that is, we can take h = 1). 
Now, we return to the case (B) wherein we have identified onign € sol(y)y 
with Doug > $. Let r = Pee, aNd Hrem = >D Po log ee Note 
a€sol(y)\y\Chigh 
that H(y) = rlog + + Hrem. Therefore, we focus on estimating Hypem. To this 
end, we define a random variable T that takes values in sol(y)|y \ Onigh Such 
that Pr[T = o] = #5. Using the function g defined above, we have Hrem = 
(1—r)-E[g(T)]. Again, we have two cases, depending on whether Hy-em, > 1 or 
not; if it is, then we can bound the ratio epee 
not, we observe that the denominator is at least 1 for r > 1/2. And, when Hrem 
is so small, we can upper bound the numerator by (1 + 0(1))m, giving overall 
varianco] (1 + 0(1))- +m. We can thus estimate Hrem using the median 


(E[9(T)]) 
of means estimator. 


we can upper bound 


, thereby reducing the complexity 


similarly to case (A). If 


4.2 Algorithm Description 


Algorithm 1 presents the proposed algorithmic framework EntropyEstimation. 
EntropyEstimation takes a formula (X,Y), a tolerance parameter £, a confi- 
dence parameter ô as input, and returns an estimate h of the entropy H,(Y), 
that is guaranteed to lie within a (1t¢)-factor of H,(Y) with confidence at least 
1 —6. Algorithm 1 assumes access to following subroutines: 


ComputeCount: The subroutine ComputeCount takes a formula y(X,Y) and a 
projection set V C X UY as input, and returns a projected model count of 
(X,Y) over V. 

UnifSample: The subroutine UnifSample takes a formula (X,Y) as an input and 
returns a uniformly sampled satisfying assignment of Y(X, Y). 
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Algorithm 1. EntropyEstimation(y(X, Y),<, 6) 
1: m |Y|;n— |X| 
2: z <— ComputeCount(y(X, Y), X) 


3: for i € [1,log(10/5)] do 

4: T — UnifSample(y) 

5: r = z7} . ComputeCount(y(X,Y) A (Y = Ty), X) 

6  ifr>śf then 

T: = PAY Arty) 

8: t- $ vanin Í grm + log(m + logm + 2.5) } 
i-r 

9: hirem — SampleEst(ĝ, z,t,0.9- 6) 

10: h— (1 — r)hrem +r log(+) 

11: return h 


12: t— $- (min {n,m + log(m + logm + 1.1)} — 1) 
13: A SampleEst (4, z,t,0.9- ô) 
14: return h 


Algorithm 2. SampleEst(y, z, t, 0) 
1: C<-[] 

2: T 3 log 2 

3: for i € [1,T] do 

4 est — 0 

5 for j € [1,t] do 

6: T <— UnifSample(y) 
T: 

8 


r= z7! . ComputeCount(y(X, Y) A (Y = Ty), X) 
: est — est + log(1/r) 
9: C.Append(£*) 
10: return Median(C) 


SampleEst: Algorithm 2 presents the subroutine SampleEst, which also assumes 
access to the ComputeCount and UnifSample subroutines. SampleEst takes as 
input a formula y(X, Y); the projected model count of y(X, Y) over X, z; the 
number of required samples, t; and a confidence parameter 6, and returns a 
median-of-means estimate of the entropy. Algorithm 2 starts off by computing 
the value of T, the required number of repetitions to ensure at least 1 — 6 
confidence for the estimate. The algorithm has two loops—one outer loop 
(Lines 3-9), and one inner loop (Lines 5-8). The outer loop runs for [3 log(4)] 
rounds, where in each round, Algorithm 2 updates a list C with the mean 
estimate, est. In the inner loop, in each round, Algorithm 2 updates the value 
of est: Line 6 draws a sample 7 using the UnifSample(y(X, Y)) subroutine. 
At Line 7, value of r is computed as the ratio of the projected model count 
of X in y(X,Y) A (Y = Ty) to z. To compute the projected model count, 
Algorithm 2 calls the subroutine ComputeCount on input (y(X,Y) A (Y = 
Tiy), X). At line 8, est is updated with log(4), and at line 9, the final est is 
added to C. Finally, at line 10, Algorithm 2 returns the median of C. 
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Returning back to Algorithm 1, it starts by computing the value of z as the 
projected model count of y(X,Y) over X at line 2. The projected model count is 
computed by calling the ComputeCount subroutine. Next, Algorithm 1 attempts 
to determine whether there exists an output Thigh with probability greater than 
1/2 or not by iterating over lines 3-11 for [log(10/d)] rounds. Line 4, draws a 
sample 7 by calling the UnifSample(y(X,Y)) subroutine. Line 5 computes the 
value of r by taking the ratio of the projected model count of p(X, Y)A(Y e qıy) 
to z. Line 6 checks whether the value of r is greater than 1/2 or not, and chooses 
one of the two paths based on the value of r: 


1. If the value of r turns out to be greater than 1/2, the formula y(X,Y) is 
updated to y(X,Y) A (Y # ty) at line 7. The resulting formula is denoted 
by (X,Y). Then, the value of required number of samples, t, is calculated 
as per the calculation shown at line 8. At line 9, the subroutine SampleEst is 
called with (X,Y), z, t, and 0.9 x 6 as arguments to compute the estimate 
oo Finally, it computes the estimate h at line 10. 

2. If the value of r is at most 1/2 in every round, the number of samples we 
use, t, is calculated as per the calculation shown at line 12. At line 13, the 
subroutine SampleEst is called with y(X,Y), z, t, and 0.9 x 6 as arguments 
to compute the estimate h. 


4.3 Theoretical Analysis 


Theorem 1. Given a circuit formula p with |Y| > 2, a tolerance parameter 
e€ >0, and confidence parameter 6 > 0, the algorithm EntropyEstimation returns 
h such that 


Pr |(1-e)Hy(¥) <h< (1+ e)/H,(¥)|] 21-6 
We first analyze the median-of-means estimator computed by SampleEst. 


Lemma 1. Given a circuit formula p and z € N, an accuracy parameter € > 0, 
a confidence parameter 6 > 0, and a batch size t € N for which 


[sol(o(¥+0))| p 2 
1 | Xoe Totte] l8 paet) 
2 


te? |sol((Y=0))| spa 
E sol(p(Yro z 
(Sear Lolle 108 aÉ) 


the algorithm SampleEst returns an estimate h such that with probability 1— ô, 


|sol(p(Y + @))| z 


hste 2 oa E paer = ay 
A ee _* 


2 oox] E BAY = o) 
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Proof. Let Rij be the random value taken by r in the ith iteration of the outer 
loop and jth iteration of the inner loop. We nee that {Rij} 3,7) are a family of 


i.i.d. random variables. Let C; = SS j=17 1 log + Be be the value appended to C at 
the end of the ith iteration of the loop. Clearly E[C;] = Eflog Rgl Furthermore, 


we observe that by independence of the Rij, 


variance[C;] = “variancellog iu! = 1 (E[log Ri;)*| — Eflog E ] ). 


tj 


By Chebyshev’s inequality, now, 


variance|C; 
Pr [ICs — Eog Foil > ellos ZI = [ z 
ij eEflog r7 
2 
_ El(log Rij)?] — Ellog z- 
t- eE[log al 
< 1/6 


by our assumption on t. 

Let L; € {0,1} be the indicator random variable for the event that C; < 
Ellog Rul — cEllog RG) and let H; € {0,1} be the indicator random variable for 
the event that C; > E[log Eg! + €E[log gl: Similarly, since these are disjoint 
events, B; = Li + H; is also an indicator random variable for the union. So 
long as bane Li < T/2 and eat H; < T/2, we note that the value returned 
by SampleEst is as desired. By the above calculation, Pr|L; = 1] + Pr[H; = 1] = 
Pr[B; = 1] < 1/6, and we note that {(B;, Li, H;)}; are a family of iid. random 
variables. Observe that by Hoeffding’s inequality, 


T 
A ES G 
t=] 


and similarly Pr Ta H; > z] < 2, Therefore, by a union bound, the returned 
value is adequate with probability at least 1 — ô overall. 


1 
S exp(—2T7) =3 


The analysis of SampleEst relied on a bound on the ratio of the first and 
second “moments” of the self-information in our truncated distribution. Suppose 
for all assignments o to Y, po < 1/2. We observe that then H,(Y) > X oer Po’ 
1 = 1. We also observe that on account of the uniform distribution on X, any 
o in the support of the distribution has p, > 1/2'*!. Such bounds allow us to 
bound the relative variance of the self information: 


Lemma 2. Let {po € [1/2!*!,1]} oor be given. Then, 


X Po (log pe)? < |X| YO po log — 


oE2Y oE2Y 


A Scalable Shannon Entropy Estimator 373 


Proof. We observe simply that 
1 1 
Y= po(log pe)? < log 2!*! X` polog — = |X| X po log —. 
oe2Y oe2Y Pa oe2Y a 
Lemma 3. Let {po € [0,1] }ceay be given with Vo ezr Po < 1 and 
1 
H = x Po log — > 1. 
oe2Y Po 


Then 


o (1 o 2 . 
Zocor P (log p ) le log(|Y | + log |Y| +1 1) IY]. 
yA IY] 

(Da Do log 2) 


Similarly, if H <1 and |Y| > 2, 


XC Po(log ps)” < |¥| + log(|Y| + log |Y| + 2.5). 


oE2Y 


Concretely, both cases give a bound that is at most 2|Y| for |Y| > 3; |Y| = 8 
gives a bound that is less than 1.5 x |Y| in both cases, |Y| = 64 gives a bound 
that is less than 1.1 x |Y], etc. 


Proof. By induction on the size of the support, denoted as supp and defined as 
{o € 2” |p, > O}|, we'll show that when H > 1, the ratio is at most log |supp| + 
log(log |supp| + log log |supp| + 1.1). The base case is when there are only two 
elements (|Y| = 1), in which case we must have pp = pı = 1/2, and the ratio is 
uniquely determined to be 1. For the induction step, observe that whenever any 
subset of the p, take value 0, this is equivalent to a distribution with smaller 
support, for which by induction hypothesis, we find the ratio is at most 


log(|supp| — 1) + log(log(|supp| — 1) + log log(|supp| — 1) + 1.1) 
< log |supp| + log(log |supp| + log log |supp| + 1.1). 


Consider any value of H,,(Y) = H. With the entropy fixed, we need only max- 
imize the numerator of the ratio with H,,(Y) = H. Indeed, we’ve already ruled 
out a ratio of |supp(Y)| for solutions in which any of the p, take value 0, and 
clearly we cannot have any pz = 1, so we only need to consider interior points 
that are local optima. We use the method of Lagrange multipliers: for some 4, 
all pọ must satisfy log” p, + 2log po — (log ps — 1) = 0, which has solutions 


bgp = 3 —14 (1 2) a= 2 1 y1 + 2/4. 


We note that the second derivatives with respect to pọ are equal to 218 Pe + a 


which are negative iff log po < 3 — 1, hence we attain local maxima only for the 
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solution log pg = 4 — 1 — y1 + 2/4. Thus, there is a single po, which by the 
entropy constraint, must satisfy |supp|p. log = = H which we’ll show gives 


H 
~ [supp|(log E5221 + tog log 2P] +p) 


for some p < 1.1. For |supp| = 3, we know 1 < H < log3, and we can verify 


numerically that log (| (0.42, 0.72) for p € [0,1]. Hence, by 
A 


Brouwer’s fixed point theorem, such a choice of p € [0,1] exists. For |supp| > 4, 


l Isuppl 4], l |supp]| 
oS One > 0. For |supp| = 4, 


log [supp 


observe that suppl > 2, so log ( 


4 4 
log (= ge Pe pte (0, 1], and similarly for all integer values of |supp| up to 


log + 


IsupPl 4 jog Jog E221 1. : : 
15, log (= aeoe los gi :) < 1.1, so we can obtain p € (0,1.1). Finally, 


log supp] 


leet |supp| 
for |supp| > 16, we have [sup Pl < gisuppl/24 and hence LELE H +" < 1, so 


log Eere] 
E- og 


Isupp]| H (log [suppl + log(log SDR + log log a +P) 
mae supel + Jog log [uel 4 F 

log lsuppl - log log [suppl ig 

~ løg Tal - log log pol ra 


Hence it is clear that this gives H for some p < 1. Observe that for such a choice 
of p, using the substitution above, the ratio we attain is 


H 


2 
|supp| - H i |supp| (log suppl + log log [supp | Sy 
og 
H? - |supp|(log EPP! + log log 822I + p) 


= T iog Isupp| + p)) 


log (1 
E + log(log 


i + log log 


|supp| 
H 


which is monotone in 1/H, so using the fact that H > 1, we find it is at most 


log |supp| + log(log |supp| + log log |supp]| + p) 


which, recalling p < 1.1, gives the claimed bound. 
For the second part, observe that by the same considerations, for fixed H, 


1 
5 Po (log po)? = aa me 
oE2yY Po 
for the unique choice of pọ for |Y | and H as above, i.e., we will show that for 
|Y| > 2, it is 


FG 2lY| wii 2lY| oe gl¥| 
(iog = + oslos = + olog = +o) 
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for some p € (0,2.5). Indeed, we again consider the function 


log(log = 27l + log log a +p) 
27T , 


F(p) = loglog 22 


and observe that for 2!Y!/H > 2, f(0) > 0. Now, when |Y| > 2 and H < 1, 
alYl/H > 4. We will see that the function d(p) = f(p) — p has no critical 
points for 2!¥!/H z a and p > 0, and hence its maximum is attained at the 
boundary, i.e., at zu = 4, at which point we see that f(2.5) < 2.5. So, for such 
values of awi f(p in maps [0, 2.5] into [0, 2.5] and hence by Brouwer’s fixed point 
theorem again, for all IY > 4and H > 1 some p € (0,2.5) exists for which 
Do = log 27- + log(log 2f- + log log “i 2 +p) gives Dy, cov Po log + = H. 


A = 
Indeed, ats j= - Steet aes ae bal 


1, which has: a singularity 
+p) log log 271 
lY 


at p = — log log ae log log =,-, and otherwise has a critical point at p = 


In2 2lY1 giv! il 
Tog log AYT log 47 log log ^r 


these are both clearly negative. 

Now, we’ll show that this expression (for |Y| > 2) is maximized when H = 1. 
Observe first that the expression H (|Y |+log 77) as a function of H does not have 
critical points for H < 1: the derivative is |Y| + log 4 — p, so critical points 
require H = 2!YI-(/'2) > 1, Hence we see that this expression is maximized at 
the boundary, when H = 1. Similarly, the rest of the expression, 


. Since log 2 2i > 2 and loglog 4- 27l > 1 here, 


1 1 
H log(|Y | + log H + log(|Y | + log 7? + 2.5) 


viewed as a function of H, only has critical points for 


1 1 
ma(l a Flog 3) 
|Y| + log # + log(|Y| + log 4) + 2.5 


1 1 
log(|Y| +log g T lostl¥| + log A +25) = 


i.e., it requires 


1 1 1 
(Y| + log A? log(|Y| + log q+? .5) log(|Y | + log a + log(|Y| + log È + 2.5) 
1 1 
= 14 ; 
ina! 1+ log =? 


But, the right-hand side is at most sp < 3, while the left-hand side is at least 
13. Thus, it also has no critical points, and its maximum is similarly taken at 
the boundary, H = 1. Thus, overall, when H < 1 and |Y| > 2 we find 


Y= po(log pe)” < |¥| + log(|Y| + log |Y | + 2.5). 


oe2Y 


Although the assignment of probability mass used in the bound did not sum 
to 1, nevertheless this bound is nearly tight. For any y > 0, and letting H =1+A 
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where A = er the following solution attains a ratio of (1 — o(1))|Y |177: 
for any two ož, o% € 2”, set Pox = 


below. To obtain 


€ E 
5 and set the rest to zt; for € chosen 


1 
2 


1 € 2 € 21 — 2 
H=2. l + (21 — 2). l 
ig T = eer 
2Y1— 2 
= (1 — e)(1 + log(1 4 i )) + €log 
observe that since log(1 + z) = ;25 + O(a”), we will need to take 
A 
E€ = 
log(2!¥! — 2) + log ££ — (1 + 35) + O(e) 


- log(2!¥! — 2) + log log(2!¥! — 2) 
For such a choice, we indeed obtain the ratio 


2 (2!¥!_92) 
— > (=o) PP 


(1 — e) log? -Z + clog 
H2 
Using these bounds, we are finally ready to prove Theorem 1: 


Proof. We first consider the case where no o € sol(y) has po > 1/2; here, the 
condition in line 6 of EntropyEstimation never passes, so we return the value 
obtained by SampleEst on line 12. Note that we must have H,(Y) > 1 in this 
case. So, by Lemma 3, 


o (log po)? log(|Y| + log|Y| + 1.1 
oez P (log p ) < min IXI, 1+ og (| | + og | | + ) IY] 
Da [Y| 
(Zoer Po log 2) 


and hence, by Lemma 1, using t > 


6-min{|X|,|Y|+log(|Y |+log |Y |+1.1)}—1 
miaf] XN V Hoei Hog Yle suttices to 


ensure that the returned h is satisfactory with probability 1 — ô. 

Next, we consider the case where some o* € sol() has po« > 1/2. Since the 
total probability is 1, there can be at most one such o*. So, in the distribution 
conditioned on o Æ o*, i.e., {pi }oeor that sets pl. = 0, and pi = ae 
otherwise, we now need to show that t satisfies 


1 (er 1 
E 1] < 5 


te? \ (Zozo Po log Gs ar 


to apply Lemma 1. We first rewrite this expression. Letting H = $., P p log A 
be the entropy of this conditional distribution, 
Zozo Po llog aoe ae) _ Zozo Po(log z)? + 2H log qiz + (log ry)? 
(ates Be OS ng) (H + log ea) 
Zozo Do (log z)? — H? 
s r Po F; 


(H + log = )2 
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Lemma 2 now gives rather directly that this quantity is at most 


H|X|- sten M ei 


(H + log =a 2log = 


For the bound in terms of |Y |, there are now two cases depending on whether 
H is greater than 1 or less than 1. When it is greater than 1, the first part of 
Lemma 3 again gives 


ocay DG (log ph)’ 
H2 


< |Y| + log(|Y| + log |Y|] + 1.1). 


When H < 1, on the other hand, recalling pọ» > 1/2 (so log 
second part of Lemma 3 gives that our expression is less than 


ea = 1), the 


|Y | + log(|Y| + log |Y| + 2.5)) — H? 
(H + log = j 


< |Y| + log(|Y| + log |Y| + 2.5). 


Thus, by Lemma 1, 


= Y| +log(|Y | + log |Y | + 2.5)} 


6- min{; z= 


t> 


E2 


suffices to obtain h such that h < (1 + €) Logo coe and h > (1 — 


£) Dra as ar log + oat ; hence we obtain such a h with probability at least 1—0.9-6 
in line 10, if we pass the test on line 6 of Algorithm 1, thus identifying o*. Note 
that this value is adequate, so we need only guarantee that the test on line 6 
passes on one of the iterations with probability at least 1 — 0.1 - 6. 

To this end, note that each sample(7,y) on line 4 is equal to o* with prob- 
ability eet oe > 4 by hypothesis. Since each iteration of the loop is an 
independent draw, the probability that one condition on line 6 is not met after 


log 2 draws is less than (1 — $)'°8 . = {o> a8 needed. 


4.4 Beyond Boolean Formulas 


We now focus on the case where the relationship between X and Y is mod- 
eled by an arbitrary relation R instead of a Boolean formula y. As noted in 
Sect. 1, program behaviors are often modeled with other representations such as 
automata [4,5,14]. The automata-based modeling often has X represented as 
the input to the given automaton A while every realization of Y corresponds to 
a state of A. Instead of an explicit description of A, one can rely on a symbolic 
description of A. Two families of techniques are currently used to estimate the 
entropy. The first technique is to enumerate the possible output states and, for 
each such state s, estimate the number of strings accepted by A if s was the only 
accepting state of A. The other technique relies on uniformly sampling a string o, 
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noting the final state of A when run on øg, and then applying a histogram-based 
technique to estimate the entropy. 

In order to use the algorithm EntropyEstimation one requires access to a 
sampler and model counter for automata; the past few years have witnessed the 
design of efficient counters for automata to handle string constraints. In addition, 
EntropyEstimation requires access to a conditioning routine to implement the 
substitution step, i.e., Y + Ty, which is easy to accomplish for automata via 
marking the corresponding state as a non-accepting state. 


5 Empirical Evaluation 


To evaluate the runtime performance of EntropyEstimation, we implemented 
a prototype in Python that employs SPUR [3] as a uniform sampler and 
GANAK [52] as a projected model counter. We experimented with 96 Boolean 
formulas arising from diverse applications ranging from QIF benchmarks [32], 
plan recognition [54], bit-blasted versions of SMTLIB benchmarks [52,54], and 
QBFEval competitions [1,2]. The value of n = |X| varies from 5 to 752 while 
the value of m = |Y | varies from 9 to 1447. 

In all of our experiments, the parameters 6 and £ were set to 0.09, 0.8 respec- 
tively. All of our experiments were conducted on a high-performance computer 
cluster with each node consisting of a E5-2690 v3 CPU with 24 cores, and 
96 GB of RAM with a memory limit set to 4 GB per core. Experiments were 
run in single-threaded mode on a single core with a timeout of 3000s. 


Baseline: As our baseline, we implemented the following approach to com- 
pute the entropy exactly, which is representative of the current state of the 


art approaches [13,27,39]?. For each valuation o € sol(y)\y, we compute 


Do = BaD, where |sol(p(Y — @))| is the count of satisfying assign- 


ments of y(Y + ø), and |sol(p); x| represents the projected model count of y 


over X. Then, finally the entropy is computed as X` po log(;-). 
oe2Y 7 
Our evaluation demonstrates that EntropyEstimation can scale to the for- 


mulas beyond the reach of the enumeration-based baseline approach. Within a 
given timeout of 3000s, EntropyEstimation is able to estimate the entropy for 
all the benchmarks, whereas the baseline approach could terminate only for 14 
benchmarks. Furthermore, EntropyEstimation estimated the entropy within the 
allowed tolerance for all the benchmarks. 


5.1 Scalability of EntropyEstimation 


Table1l presents the performance of EntropyEstimation vis-a-vis the baseline 
approach for 20 benchmarks.? Column 1 of Table1 gives the names of the 


2 We wish to emphasize that none of the previous approaches could provide theoretical 
guarantees of (e, ô) without enumerating over all possible assignments to Y. 

3 The complete analysis for all of the benchmarks is deferred to the technical report 
https://arxiv.org/pdf/2206.00921.pdf. 
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Table 1. “-” represents that entropy could not be estimated due to timeout. Note that 
m = |Y | and n= |X|. 


Benchmarks |X| |Y| Baseline EntropyEstimation 
count count /sample 
Time(s) queries Time(s) queries 
pwd-backdoor 336 64 - 1.841019 5.41 1.25 x 10? 
case31 13 40 201.02 1.02x10% 125.36 5.65x10? 
case23 14 63 420.85 2.05x10% 141.17 6.10x10? 


$1488_15.7 14 927 1037.71 3.84x10® 150.29  6.10x10? 
bugl-fix-4 53 17 373.52 1.76x10% 212.37 9.60x10? 


s832a_15_7 23 670 - 2.65x10° 247 1.04 10? 
dyn-fix-1 40 48 - 3.30x10* 252.2 1.8310? 
s1196a_7_4 32 676 - 4.22x10" 343.68 1.4610 
backdoor-2x16 168 32 - 1.31x105 405.7 1.70«10° 
CVE-2007 752 32 - 4.29x10° 654.54 1.7010 
subtraction32 65 218 - 1.8410! 860.88 3.0010? 
case-l_b11-1 48 292 - 2.75x10'+ 1164.36 2.20x10? 
s420_15_7-1 235 116 - 3.52x10" 1187.23 5.7210 
casel45 64 155 - 7.04x10'3 1243.11 2.96x10° 
floor64-1 405 161 - 2.32107" 1764.2 7.85x10° 
s641_74 54 453 - 1.74x10!? 1849.84 2.4810? 
decomp64 381 191 - 6.81x10°° 2239.62 9.26x10° 
squaring2 72 813 - 6.87x10!° 2348.6 3.33x10° 
stmt5_731-730 379 311 - 3.49x10!° 2814.58 1.49x10* 


benchmarks, while columns 2 and 3 list the numbers of X and Y variables. 
Columns 4 and 5 respectively present the time taken, number of samples used by 
baseline approach, and columns 6 and 7 present the same for EntropyEstimation. 
The required number of samples for the baseline approach is |sol(y) |y|. 

Table 1 clearly demonstrates that EntropyEstimation outperforms the base- 
line approach. As shown in Table 1, there are some benchmarks for which the 
projected model count on V is greater than 10°°, i.e., the baseline approach 
would need 10°° valuations to compute the entropy exactly. By contrast, the 
proposed algorithm EntropyEstimation needed at most ~ 10* samples to esti- 
mate the entropy within the given tolerance and confidence. The number of 
samples required to estimate the entropy is reduced significantly with our pro- 
posed approach, making it scalable. 


5.2 Quality of Estimates 


There were only 14 benchmarks out of 96 for which the enumeration-based base- 
line approach finished within a given timeout of 3000s. Therefore, we compared 


380 P. Golia et al. 


the entropy estimated by EntropyEstimation with the baseline for those 14 bench- 
marks only. Figure 1 shows how accurate were the estimates of the entropy by 
EntropyEstimation. The y-axis represents the observed error, which was calcu- 
lated as maa ( Estimated 1, ae q — 1), and the x-axis represents the bench- 
marks ordered in ascending order of observed error; that is, a bar at x represents 


the observed error for a benchmark—the lower, the better. 


Allowed 
tolerance 


Observed error 
o 
B 


0.001 0.001 0.001 0.001 0,001 0.002 9.004 0. 


1 2 3 4 5 6 7 
Benchmarks 


T 0.005 0.015 
0d 0.004 0,005 Ci 
8 9 10 11 12 13 14 


Fig. 1. The accuracy of estimated entropy using EntropyEstimation for 14 benchmarks. 
e = 0.8, 6 = 0.09. (Color figure online) 


The red horizontal line in Fig. 1 indicates the maximum allowed tolerance (€), 
which was set to 0.80 in our experiments. We observe that for all 14 benchmarks, 
EntropyEstimation estimated the entropy within the allowed tolerance; in fact, 
the observed error was greater than 0.1 for just 2 out of the 14 benchmarks, and 
the maximum error observed was 0.29. 


Alternative Baselines: As we discussed earlier, several other algorithms have 
been proposed for estimating the entropy. For example, Valiant and Valiant’s 


algorithm [58] obtains an ¢-additive approximation using o2) samples, and 


Chakraborty et al. [17] compute such approximations using o) samples. We 
stress that neither of these is exact, and thus could not be used to assess the 
accuracy of our method as presented in Fig. 1. Moreover, based on Table 1, we 
observe that the number of sampling or counting calls that could be computed 
within the timeout was roughly 2 x 104, where m ranges between 101—103. Thus, 
the method of Chakraborty et al. [17], which would take 10” or more samples on 
all benchmarks, would not be competitive with our method, which never used 
2 x 104 calls. The method of Valiant and Valiant, on the other hand, would 
likely allow a few more benchmarks to be estimated (perhaps up to a fifth of 
the benchmarks). Still, it would not be competitive with our technique except in 
the smallest benchmarks (for which the baseline required < 10° samples, about 
a third of our benchmarks), since we were otherwise more than a factor of m 
faster than the baseline. 
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6 Conclusion 


In this work, we considered estimating the Shannon entropy of a distribution 
specified by a circuit formula y(X, Y). Prior work relied on O(2) model count- 
ing queries and, therefore, could not scale to instances beyond small values of m. 
In contrast, we propose a novel technique, called EntropyEstimation, for estima- 
tion of entropy that takes advantage of the access to the formula ọ via condition- 
ing. EntropyEstimation makes only O(min(m,n)) model counting and sampling 
queries, and therefore scales significantly better than the prior approaches. 
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Abstract. Secure multi-party computation (MPC) is a promising tech- 
nique for privacy-persevering applications. A number of MPC frame- 
works have been proposed to reduce the burden of designing customized 
protocols, allowing non-experts to quickly develop and deploy MPC 
applications. To improve performance, recent MPC frameworks allow 
users to declare variables secret only for these which are to be protected. 
However, in practice, it is usually highly non-trivial for non-experts to 
specify secret variables: declaring too many degrades the performance 
while declaring too less compromises privacy. To address this problem, 
in this work we propose an automated security policy synthesis approach 
to declare as few secret variables as possible but without compromis- 
ing security. Our approach is a synergistic integration of type inference 
and symbolic reasoning. The former is able to quickly infer a sound— 
but sometimes conservative—security policy, whereas the latter allows 
to identify secret variables in a security policy that can be declassified 
in a precise manner. Moreover, the results from symbolic reasoning are 
fed back to type inference to refine the security types even further. We 
implement our approach in a new tool PoS4MPC. Experimental results 
on five typical MPC applications confirm the efficacy of our approach. 
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1 Introduction 


Secure multi-party computation (MPC) is a powerful cryptographic paradigm, 
allowing mutually distrusting parties to collaboratively compute a public func- 
tion over their private data without a trusted third party and revealing nothing 
beyond the result of the computation and their own private data [14,43]. MPC 
has potential for broader uses in practical applications, e.g., truthful auctions, 
avoiding satellite collisions [22], private machine learning [41], and data anal- 
ysis [35]. However, practical deployment of MPC has been limited due to its 
computational and communication complexity. 

To foster applications of MPC, a number of general-purpose MPC frameworks 
have been proposed, e.g., [9,24,29,34,37,44]. These frameworks provide high- 
level languages for specifying MPC applications as well as compilers for trans- 
lating them into executable implementations, thus drastically reduce the burden 
of designing customized protocols and allow non-experts to quickly develop and 
deploy MPC applications. To improve performance, many MPC frameworks pro- 
vide features to declare secret variables so that only these variables are to be 
protected. However, such frameworks usually do not verify rigorously whether 
there is information leakage, or, on some occasions, provide only light-weighted 
checking (via, e.g., information-flow analysis). Even though some frameworks 
are equipped with formal security guarantees, it is challenging for non-experts 
to develop an MPC program that simultaneously achieves good performance and 
formal security guarantees [3,28]. A typical case for an user is to declare all vari- 
ables secret while ideally one would declare as few secret variables as possible to 
achieve a good performance without compromising security. 

In this work, we propose an automated security policy synthesis approach for 
MPC. We first formalize the leakage of an MPC application in the ideal-world 
as a set of private inputs and define the notion of security policy, which assigns 
each variable a security level. This can bridge the language-level and protocol- 
level leakages, hence our approach is independent of the specific MPC protocols 
being used. Based on the leakage characterization, we provide a type system 
to infer security policies by tracking both control- and data-flow of informa- 
tion from private inputs. While a security policy inferred from the type system 
formally guarantees that the MPC application will not leak more information 
than the result of the computation and participants’ own private data, it may 
be too conservative. For instance, some variables could be declassified without 
compromising security but with improved performance. Therefore, we propose a 
symbolic reasoning approach to identify secret variables in security policies that 
can be declassified without compromising security. We also feed back the results 
from the symbolic reasoning to type inference to refine the security type further. 

We implement our approach in a new tool PoS4MPC (Policy Synthesis 
for MPC) based on the LLVM Compiler [1] and the KLEE symbolic execution 
engine [10]. Experimental results on five typical MPC applications show that our 
approach can generate less restrictive security policies than using the type system 
solely. We also deploy the generated security policies in two MPC frameworks 
Obliv-C [44] and MPyC [87]. The results show that, for instance, the security 
policies generated by our approach can reduce the execution time by 31%-1.56 x 
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Fig. 1. The richest one of three millionaires Fig. 2. Ideal-world vs. real-world 


10°%, the circuit size by 38%-3.61 x 10°%, and the communication traffic by 
39%-4.17 x 10°% in Obliv-C. 
To summarize, our main technical contributions are as follows. 


— A formalization of information leakage for MPC applications and the notion 
of security policy to bridge the language-level and protocol-level leakages; 

— An automated security policy synthesis approach that is able to generate less 
restrictive security policies; 

— An implementation of our approach for a real-world language and an evalu- 
ation on challenging benchmarks from the literature. 


Outline. Section2 presents the motivation of this work and overview of our 
approach. Section 3 gives the background of MPC. Section 4 introduces a simple 
language on which we formalize the leakage of MPC applications. We propose 
a type system for inferring security policies in Sect. 5 and a symbolic reasoning 
approach for declassification in Sect.6. Implementation details and experimen- 
tal results are given in Sect.7. Finally, we discuss related work in Sect.8 and 
conclude this paper in Sect. 9. 
Missing proofs can be found in the full version of this paper [15]. 


2 Motivation 


Figure 1 shows a motivating example that computes the richest among three 
millionaires. To preserve the privacy, the millionaires can privately send their 
inputs to a trusted third party (TTP) as shown in Fig.2 (ideal-world). This 
reveals the richest millionaire with the least leakage of information. Table 1 shows 
the leakage for each result r = 1, 2,3, as well as the leakage if the secret branching 
variables c1 and c2 are declassified (i.e., from secret to public). 


Table 1. Leakage from each result and declassified secret branching variables 


Result Leakage of Result Leakage of c1 Leakage of c2 
r=1 a>bAa>c a>b a>c 
r=2 a<bAbo>c a<b bec 
r=3 c > max(a, b) a>bVa<b c > max(a, b) 


388 Y. Fan et al. 


To achieve the same functionality without TTP, secure multi-party compu- 
tation (MPC) was proposed [14,43]. One can implement the computation using 
an MPC protocol m where all the parties collaboratively compute the result over 
their private inputs via network communications (shown in Fig. 2 (real-world)). 

To facilitate applications of MPC, various MPC frameworks, e.g., Obliv- 
C [44], MP-SPDZ [24] and MPyC [37], have been proposed, which provide high- 
level languages for specifying MPC applications, as well as compilers for trans- 
lating them into executable implementations. To improve performance, these 
frameworks often allow users to declare secret variables so that only the values 
of secret variables are to be protected. However, in practice, it is usually quite 
challenging for non-experts to specify secret variables properly: declaring too 
many secret variables would degrade the performance, whereas declaring too 
less secret variables risks compromising security and privacy. 

In this work, we propose an automated synthesis approach, aiming to declare 
as few secret variables as possible but without compromising security. To capture 
privacy, we formalize the leakage of MPC applications in the ideal-world as a set 
of private inputs. For instance, the leakage of the result r = 1 in the motivating 
example is the set of inputs such that a > b/Aa> c. We introduce the notion 
of security policy, which assigns each variable a security level, to bridge the 
language-level and protocol-level leakages, so that our approach is independent 
of specific MPC protocols being used. The language-level leakage of a security 
policy is characterized by a set of private inputs with respect to not only the 
result but also the values of public variables in the intermediate computations. 

Based on the leakage characterization, we propose a type system to automat- 
ically infer security policies, inspired by the work of proving noninterference of 
programs [40]. Our type system tracks both control-flow and data-flow of infor- 
mation from the private inputs, and infers a security policy. For instance, all the 
variables in the motivating example are inferred as secret. 

Although a security policy inferred by the type system formally guarantees 
that the MPC application will not leak more information than that in the ideal- 
world, it may be too conservative. For instance, declassifying the variable c2 in 
the example would not compromise security. As shown in Table 1, the leakage 
caused by declassifying c2 can be deduced from the leakage of the result. In 
contrast, we cannot declassify c1, as neither a > b nor a < b can be deduced 
from the leakage c > max(a,b). Once c1 is declassified, the adversary would 
learn if a > bora < b. This problem is akin to downgrading and declassification 
of high security levels in information-flow analysis [27], and could be solved via 
self-composition [39,42] that often require users to write annotations for proce- 
dure contracts and loop invariants. In this work, for the sake of efficiency and 
usability for non-experts, we propose an alternative approach based on symbolic 
execution. We leverage symbolic execution to finitely represent a potentially infi- 
nite set of concrete executions, and propose an automated approach to infer if 
a secret variable can be declassified by reasoning about pairs of symbolic exe- 
cutions. For instance, in Example 1, our approach is able to identify that c2 
can be declassified without compromising security. In general, the experimental 
results show that our approach is effective and the generated security policies 
can significantly improve the performance of MPC applications. 
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3 Secure MPC 


Fix a set of variables ¥ over a domain D. We write X, € ¥” and v, € D” 
for tuples (1,--+ ,£n) and (v1,--+- , Un) respectively. (The subscript n may be 
dropped when it is clear from the context.) 


MPC in the Ideal-World. An n-party MPC application f : D” — D is to 
confidentially compute a given function f(X), where each party P; for 1 <i<n 
sends her private input v; E€ D to a TTP T which computes and returns the 
result f(¥) to all the parties. In the ideal world, an adversary that controls any 
of the n parties learns no more than the output f(V) and the private inputs of 
the corrupted (dishonest) parties. 

We characterize the leakage of an MPC application f(X) by a set of private 
inputs. Hereafter, we assume, w.lo.g., the first k parties (i.e., P1,--- , Pk) are 
corrupted by the adversary for some k > 1. For a given output v € D, let 
œf C D” be the set {V € D” | f(¥) = v}. Intuitively, ~f is the set of the private 
inputs V € D” under which f is evaluated to v. From the result v, the adversary 
is able to learn the set ~f, but cannot tell which one from ~f given v. We refer 


to ~f as the indistinguishable space of the private inputs w.r.t. the result v. The 


input domain D” is then partitioned into indistinguishable spaces {>f pep. 


When the adversary controls the parties P;,--- ,P,, she will learn the set 
Leak! (v, Vp) := { (v1; ,Un) € D” | Ve = v1,- Uk }N Xf, from the result v 


and the adversary-chosen private inputs V € DF. 


Definition 1 (Leakage in the ideal-world). For an MPC application f (Xn), 
the leakage of computing v = f(Vn) in the ideal-world is Leak! (v, Vx), for the 
adversary-chosen private inputs Vy € D! and the result v € D. 


MPC in the Real-World. An MPC application in the real-world is imple- 
mented using some MPC protocol m (denoted by mf) by which all the parties 
collaboratively compute 7(X) over their private inputs V without any TTP T. 
Introduction of MPC protocols can be found in [14]. 

There are generally two types of adversaries in the real world, i.e., semi- 
honest and malicious. An adversary is semi-honest (a.k.a. passive) if the cor- 
rupted parties run the protocol honestly as specified, but may try to learn private 
information of other parties by observing the protocol execution (i.e., network 
messages and program states). An adversary is malicious (a.k.a. active) if the 
corrupted parties can deviate arbitrarily from the prescribed protocol (e.g., con- 
trol, manipulate, and inject messages) in an attempt to learn private information 
of the other parties. In this work, we consider semi-honest adversaries, which are 
supported by most MPC frameworks and often serve as a basis for MPC in more 
robust settings with powerful adversaries. 

A protocol 7 is (semi-honest) secure if what a (semi-honest) adversary can 
achieve in the real-world can also be achieved by a corresponding adversary in 
the ideal-world. Semi-honest security ensures that the corrupted parties learn 
no more information from executing the protocol than what they can learn from 
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the result and the private inputs of the corrupted parties. Therefore, the leakage 
of an MPC application f(X) in the real-world against the semi-honest adversary 
can also be characterized using the indistinguishability of private inputs. 


Definition 2. An MPC protocol r is (semi-honest) secure if for any MPC appli- 
cation f(X,), adversary-chosen private inputs Vy E€ DE and result v € D, the 
leakage of computing v = 7(Wn) is Leak! (v, Vg). 


4 Language-Level Leakage Characterization 


In this section, we characterize the leakage of MPC applications from the lan- 
guage perspective. 


4.1 A Language for MPC 


We consider a simple language WHILE for implementing MPC applications. The 
syntax of WHILE programs is defined as follows. 


p ::=skip | £ =e | p1; p2 | if x then pı else po | return x 
| while x do p | repeat n do p 


where e is an expression defined as usual and n is a positive integer. 

Despite its simplicity, WHILE suffices to illustrate our approach and our tool 
supports a real-world language. Note that we introduce two loop constructs. 
The while loop can only be used with the secret-independent conditions while 
the repeat loop (with a fixed number n of iterations) can have secret-dependent 
conditions. The restriction of the while loop is necessary, as the adversary knows 
when to terminate the loop, so secret information may be leaked if a secret- 
dependent condition is used [44]. 

The operational semantics of the WHILE program is defined in a standard 
way (cf. [15]). In particular, repeat n do p means repeating the loop body p for 
a fixed number n times. A configuration is a tuple (p,a), where p denotes a 
statement and o : X — D denotes a state that maps variables to values. The 
evaluation of an expression e under a state o is denoted by o(e). A transition 
from (p, a) to (p', a’) is denoted by (p, o) — (p’,o’) and —* denotes the transitive 
closure of —. An execution starting from the configuration (p, a) is a sequence of 
configurations. We write (p, o) || a’ if (p,a) —* (skip, o’). We assume that each 
execution ends in a return statement, i.e., all the while loops always terminate. 
We denote by (p,c) |} o’ : v the execution returning value v. 


4.2 Leakage Characterization in Ideal/Real-World 


An MPC application f(X) is implemented as a WHILE program p. An execution 
of the program p evaluates the computation f(X) as if a TTP directly executed 
the program p on the private inputs. In this setting, the adversary cannot observe 
any intermediate states of the execution other than the final result. 


PoS4MPC: Automated Security Policy Synthesis 391 


Let Xi? = {x1,-+- , an} C X be the set of private input variables. We denote 
by State the set of the initial states. Given a tuple of values ¥; € D* and a 
result v € D, let Leak?,(v,¥;,) denote the set of states o € Stateg such that 
(p,o) |} o’ : v for some state o’ and o(a;) = v; for 1 < i < k. Intuitively, 
when the adversary controls the parties P1,--- , Px, she learns the set of states 
Leak? (v, Vk) from the result v and the adversary-chosen private inputs Vy € D*. 
We can reformulate the leakage of an MPC application f(X) in the ideal-world 
(cf. Definition 1) as follows. 


Proposition 1. Given an MPC application f (Xn) implemented by a program p, 
vi, € Leak! (v, Vp) iff there exists a state o € Leak? (v, Vp) such that o(x;) = v! 
forl<i<n. 


We use security policies to characterize the leakage of MPC applications in 
the real-world. 


Security Level. We consider a lattice of security levels L = {Sec, Pub} with 
Pub LE Pub, Pub C Sec, Sec C Sec and Sec Z Pub. We denote by 41 Li £2 the least 
upper bound of two security levels 41, 22 E€ L, namely, £U Sec = Sec U £ = Sec 
for £ € L and Pub U Pub = Pub. 


Definition 3. A security policy 9: X — L for the MPC application f(X) is a 
function that associates each variable x E€ X with a security level £ € L. 


Given a security policy @ and a security level £ € L, let ¥* := {x | o(x) = 
4} C X, ie., the set of variables with the security level Z under ọ. We lift the 
order E to security policies, namely, 0 E o’ if o(x) E g'(x) for each z € X. 
When executing the program p with a security policy @ using an MPC protocol 
Tt, we assume that the adversary can observe the values of the public variables 
x E€ X™> but not that of the secret variables x € °°. 

This is a practical assumption and can be well-supported by the existing 
approach. For instance, Obliv-C [44] allows developers to define an MPC appli- 
cation in an extension of C language, when compiled and linked, the result will 
be a concrete garbled circuit protocol mp whose computation does not reveal the 
values of any oblivious-qualified variables. Thus, all the secret variables specified 
by the security policy @ can be declared as oblivious-qualified variables in Obliv- 
C, while all the public variables specified by the security policy @ are declared 
without oblivious-qualification. Similarly, MPyC [37] is a Python package for 
implementing MPC applications that allows programmers to define instances of 
secret-typed variable classes using Python’s class mechanism. When executing 
MPC applications, instances of secret-typed class variables are protected via 
Shamir’s secret sharing protocol [38]. Thus, all the secret variables specified by 
the security policy o can be declared as instances of secret-typed variable classes 
in MPyC, while all the public variables specified by the security policy o are 
declared as instances of Python’s standard classes. 


Leakage Under a Security Policy. Fix a security policy o for the program 
p. Remark that the values of the secret variables will not be known even at run- 
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time for each party, as they are encrypted. This means that, unlike the secret- 
independent conditions, the secret-dependent conditions cannot be executed nor- 
mally, and thus should be removed using, e.g., multiplexers, before transforming 
into circuits. We define the transformation T,(-,-), where c is the selector of a 
multiplexer. 


Ta(c, p1; p2) © Tele, p1); Te(c, p2) Te(e, return z) Ê return z 
Talc, £ =e) xr =g +cx (e—- 2) To(c, skip) = skip 
š a J if x then To(1, pı) else Tp(1, p2), if c = 1 A^ o(x) = Pub; 
Felt o Sne pi SINE po) = eee Mem otherwise. 
: a J while x do To(1, p), if c = 1 A o(a) = Pub; 
Te(c, while æ do p) = taa otherwise. 


Ta(c, repeat n do p) Ê repeat n do T,(c, p) 


Intuitively, c in T,(c,-) indicates whether the statement is under some secret- 
dependent branching statements. Initially, c = 1. During the transformation, c 
will be conjuncted with the branching condition x or ~x when transforming 
if x then pı else pg if x is secret or c Æ 1. The control flow inside should be 
protected if c Æ 1. If c = 1 and the condition variable x is public, the statement 
needs not be protected. T(c,xz = e) simulates a multiplexer with two different 
values depending on whether the assignment x = e is in the scope of some 
secret-dependent conditions. At runtime, the value e is assigned to x if c is 1, 
otherwise x does not change. T,(c,while x do p) enforces that the while loop 
is used in secret-independent conditions and x is public in the security policy 
o otherwise throws an error. The other cases are trivial. We denote by py, the 
program 7,(1,p) on which we will define the leakage of p in the real-world. 

For every state o : X — D, let a? : X™> — D denote the state that is 
the projection of the state ø onto the public variables 4°. For each execution 
(Do, 01) 1) 72, we denote by (Po, 01) 45°" o2 the sequence of configurations where 
each state ø is replaced by the state a”. 

Recall that the adversary can observe the values of public variables « € VP? 

when executing the program Po. Thus, from an execution (Po, o1) 4) a2 : v, she 
can observe the sequence (Pg, 71) Lg” og and the result v, written as (Po, o1) Y> 
o : v. For every state o € Leak? (v, Vk), we denote by Leak?:?(v,c) the set of 
states o’ € Leak?,(v,V,) such that (Po, o’) oY of : v and (Po, o) Yh 01 : v are 
identical. 
Definition 4. A security policy o is perfect for a given MPC application f (Xn) 
implemented by the program p, denoted by o =p f(Xn), if To(1,p) does not throw 
any errors, and for adversary-chosen private inputs Vp € DF, the result v € D, 
and the state o € Leak? (v, Vk), we have that 


Leak?,,(v, Vk) = Leak?! (v, o). 


Intuitively, a perfect security policy o ensures that for every state o € 
Leak{„ (v, Vk), from the observation (Po, o) {°° o’ : v, the adversary only learns 
the same set Leak? (v, Vk) of initial states as that in the ideal-world. 
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Our goal is to compute a perfect security policy o for every program p that 
implements the MPC f(x). A naive way is to assign the high security level Sec 
to all the variables 4, which may however suffer from a lower performance, as 
all the intermediate computations have to be performed on encrypted data and 
conditional statements have to removed. Ideally, a security policy 9 should not 
only be perfect but also annotate as few secret variables as possible. 


5 Type System 


In this section, we present a sound type system to automatically infer perfect 
security policies. We first define noninterference of a program p w.r.t. a security 
policy o, which is shown to entail the perfectness of o. 


Definition 5. A program p is noninterfering w.r.t. a security policy o, written 
as g-noninterfering, if T,(1,p) does not throw any errors and (Po, 01) yP O2: V 
and (Po, o1) Yo” ah: v! are the same for each pair of states o1, 0} € Stateo. 


Intuitively, the ø-noninterference ensures that for all private inputs of the 
n parties (without the adversary-chosen private inputs), the adversary observes 
the same sequence of the configurations from all the executions that return the 
same value. 

The gnoninterference of p entails the perfectness of @ where the adver- 
sary can choose arbitrary private inputs Vv, € D* of the corrupted participants 
(P1,--- , Pk) for any k > 1. 


Proposition 2. If p is o-noninterfering for a security policy o, then o =p f(X). 


Note that the converse of Proposition 2 does not necessarily hold due to the 
adversary-chosen private inputs. For instance, suppose (Po, c1) {5° o2 : v and 
(Do, 04) Vo” 05 : v are identical for every pair of states o1, o} € Leak?,(v, v1), and 
(Do, 03) Yo"? o4 : v and (Po, 03) IG"? o4 : v are identical for every pair of states 
03,03 € Leak}, (v, vj). If v Æ vj, then (Po, 01) JE"? 02 : v and (Po, 03) YO a4: v 
are different, implying that p is not g-noninterfering. 

Based on Proposition2, we present a type system for inferring a perfect 
security policy o of a given program p such that p is g-noninterfering. The typing 
judgement is in the form of c F p: @ => o’, where the type contexts 0, 0’ are 
security policies, p is the program under typing, and c is the security level of the 
current control flow. The typing judgement c F p: o > ọ' states that given the 
security level of the current control flow c and the type context o, the statement 
p is typable and yields a new updated type context 0’. 

The type inference rules are shown in Fig. 3 which track the security levels 
of both data- and control-flow of information from private inputs, where o(e) 
denotes the least upper bound of the security levels o(x) of variables x used in 
the expression e and @ U o2 is the security policy such that for every variable 
xz € X, (01 U @2)(x) = o1(x) U o2(x). 1fp(c,n,o,p) is o if n = 0 or d = a, 
otherwise 1fp(c,n — 1, o', p), where ct p : o = o’. Note that constants have the 
security level Pub. Most of those rules are standard. 
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g = olz > cU g(e) 


] [T-AssIGn] 


— |T-SKIP 
ee l cFr=e: 0> 
cH pi: o> c Fp:0> 
pı: o= Q e(z) pm: eg Aao 
cF po: 01 > 02 IT-SEQ] cU olx) F pz: 0 => @ [T-Ir] 
c F pı; p2 : 0 = 02 cF if x then pı else po: o > o! 
‘=1f 
[T-RETURN] e plc:n, o, p) [T-REPEAT] 
cF return 2: o0 => 0 cF repeat n do p : o > 0’ 
=Pub c=Pub g= —1, 0, 
o(z) =Pub c=Pub 0’ = lfp(Pub, —1, o, p) [T-Wamr] 


cl while x do p : o => o' 


Fig. 3. Type inference rules 


Rule T-ASSIGN disables the data-flow and control-flow of information from 
the security level Sec to the security level Pub. To meet this constraint, the 
security level of the variable x is updated to the least upper bound c U o(e) 
of the security levels of the current control flow c and variables used in the 
expression e. Rule T-IF passes the security level c of the current control flow into 
both branches, preventing from assigning values to public variables in those two 
branches when c = Sec. Rule T-WHILE requires that the loop condition is public 
and the loop is used with secret-independent conditions, ensuring that 7,(1, p) 
does not throw any errors. Rule T-RETURN does not impose any constraints on 
x, as the return value is observable to the adversary. 

Let oo : X — L be the mapping such that oọo(x) = Sec for all x € ¥S*, 
0o(a) = Pub otherwise. If the typing judgement Pub F p: gg = o is valid, then 
the values of all the public variables specified by ọ do not depend on any values 
of private inputs. Thus, it is straightforward to get that: 


Proposition 3. If the typing judgement Pub F p : oo = o is valid, then the 
program p is o-noninterfering. 


From Proposition 2 and Theorem 3, we have 


Corollary 1. If Publ p: 09 => o is valid, then o is perfect, i.e., o =p f(X). 


6 Degrading Security Levels 


The type system allows to infer a security policy o such that the type judgement 
Pub F p: oo = @ is valid, from which we can deduce that o p f(X), i.e., o is 
perfect for the MPC application f(X) implemented by the program p. However, 
the security policy 9 may be too conservative, i.e., some secret variables specified 
by o can be declassified without compromising the security. In this section, we 
propose an automated approach to identify these variables. We mainly consider 
minimizing the number of secret branching variables, viz., the secret variables 
used in branching conditions, as they usually incur a high computation and 
communication overhead. W.l.o.g., we assume that for each secret branching 
variable x there is only one assignment to x and it is used only in one conditional 
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[x =e,a,¢| > [skip, alz > a(e), gl] [return 2, a, ¢| > [skip, a, | 
(pi, 01,1] > [skip, a2, $2] [p1,01, 1] > [pi a2, $2] 
[p2, a2, ¢2| > [po, a3, $3] pi Æ skip 
[p1; p2,@1, $1] > [P2 03, $3] [ P1; p2, 01, $1] > [P1; p2, a2, H2] 
SAT(%') $ =¢Aa(z) SAT(¢’) 6! = $A -a(a) 
[if a then pı else p2,a,¢] > [pi,a, Q] [if x then pı else p2,a,d| > [p2,a, Q] 
SAT(¢’) ¢ =¢Aa(x) p =p;while x do p SAT(¢’) # =¢An7a(x) p' = skip 
[while do p,a, 6] > Ipa, g] [while a do p,a, 4] > [p',0, 6] 


p' = (n > 1) ? p;repeat n— 1 do p : skip 
[repeat n do p, a, ġ] > [p’,a, 4] 


Fig. 4. The symbolic semantics of WHILE programs 


statement. (We can rename variables in p if this assumption does not hold, 
where the named variables have the same security levels as their original names.) 
With this assumption, whether x can be declassified depends only on the unique 
conditional statement where it occurs. 

Fix a security policy 9 such that o =p f(X). Suppose that if x then pı else po is 
not used with secret-dependent conditions. Let o’ be the security policy o[a => 
Pub]. It is easy to see that T,-(1,p) does not raise any errors. Therefore, to 
declassify x, we need to ensure that (Py, o’) ir? oi : v and (Po, o) Yor? oi: v 


are identical for every adversary-chosen private inputs Vv, € D*, result v € D, 
and states a, 0’ € Leak? (v, Vk). However, as the number of the initial states may 
be large and even infinite, it is infeasible to check all pairs of executions. 

We propose to use symbolic executions to represent the potentially infinite 
sets of (concrete) executions. Each symbolic execution t is associated with a path 
condition ¢ which denotes the set of initial states satisfying ¢, from each of which 
the execution has the same sequence of statements. Thus, the conjunction @Ae = 
v, where e is the symbolic return value and v is concrete value, represents the set 
of initial states from which the executions have the same sequence of statements 
and returns the same result v. It is not difficult to observe that checking whether 
x in if x then pı else p2 can be declassified amounts to checking whether for every 
pair of symbolic executions tı and tə that both include if x then pı else po, x 
has the same truth value in tı and t2 whenever tı and tz return the same value. 
This can be solved by invoking off-the-shelf SMT solvers. 


6.1 Symbolic Semantics 


Let E denote the set of expressions over the private input variables x and con- 
stants. A path condition ¢ € E is a conjunction of Boolean expressions. A state 
a € Stateg satisfies ¢, denoted by o = 4, if ọ evaluates to True under ø. A 
symbolic state a is a function ¥ — E that maps variables to symbolic expres- 
sions. a(e) denotes the symbolic value of the expression e under a, obtained 
from e by replacing each occurrence of variable x by a(x). The initial symbolic 
state, denoted by ag, is the identity function over the private input variables X. 
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The symbolic semantics of WHILE programs is defined by transitions between 
symbolic configurations, as shown in Fig.4, where SAT(@) is True iff the con- 
straint @ is satisfiable. A symbolic configuration is a tuple [p,a,@|, where p 
is a statement, a is a symbolic state, and ¢ is the path condition that should 
be satisfied to reach [p,a,¢]. [p,a,¢| © [p’,a’,¢’| denotes a transition from 
[p,a,@| to [p’,a’, ¢’|. The symbolic semantics is almost the same as the oper- 
ational semantics except that (1) the path conditions are collected and checked 
for conditional statements and while loops, and (2) the transition may be non- 
deterministic if both ¢ A a(x) and ¢ A 7a(z) are satisfiable. 

We denote by —* the transitive closure of —, where its path condition is 
the conjunction of that of each transition. An symbolic execution starting from 
a symbolic configuration [p, a, | is a sequence of symbolic configurations, writ- 
ten as [p,a,¢| |) (a’,¢’), if [p,a, p] —* [skip, a’, d’]. Moreover, we denote by 
[p,a,@| 4 (a’,¢’) : e the symbolic execution [p,a,¢| 4 (a’,¢’) with the sym- 
bolic return value e. We denote by SymExe the set of all the symbolic executions 
|p, ao, True] 4 (a, ġ) : e of the program p. Note that ao is the initial symbolic 
state. Recall that we assumed all the (concrete) executions always terminate, 
thus SymExe is a finite set of finite sequence of symbolic configurations. 


6.2 Relating Symbolic Executions to Concrete Executions 


A symbolic execution t = |p, œo, True] J) (a,¢) : e represents the set of (con- 
crete) executions starting from the states ø € Stateg such that o = ¢. Formally, 
consider o € Stateg such that o = ¢, by concretizing all the symbolic values 
of variables x in each symbolic state a’ with concrete values o(a’(a)) and pro- 
jecting out all the path conditions, the symbolic execution t is the execution 
(p,a) |. o’ : o(e), written as o(t). For the execution (p,c) 4) o’ : v, there are a 
unique symbolic execution t such that a(t) = (p,a) |) o’ : v and a unique exe- 
cution (Po, o) |} o’ : v in the program Po. We denote by RW,,,(t) the execution 
(Bo, 0) Yo o’ : v and denote by RW/5(t) the sequence (Po, o) JR a! : v. 

For every adversary-chosen private inputs Vv, € D*, result v € D, and ini- 
tial state o € Leak?,(v,V,), we can reformulate the set Leak?:°(v,c) as fol- 
lows. (Recall that Leak?:°(v,c) is the set of states o’ € Leak}, (v, Vk) such that 
(Do, 0’) WO” o1 : v and (Po, o) Wo 01 : v are identical.) 


Proposition 4. For each state o' € Leak?,(v,v;), o” € Leak olv, 0) iff for 
every symbolic execution t = [p,ao,True| 4 (a’,¢’) : e E€ SymExe such that 
o H o Ae =v, RW, o(t) and RWS (t) are identical, where t is a symbolic 
execution |p, ao, True] |) (a, ġ) : e such that o = dAe=v. 


Proposition 4 allows to consider only the symbolic executions [p, ao, True] 4} 
(a, ġ) : e E€ SymExe such that o = éAe = v when checking if o is perfect or not. 


6.3 Reasoning About Symbolic Executions 


We leverage Proposition 4 to identify secret variables that can be declassified 
without compromising the security by reasoning about symbolic executions. For 
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each expression ¢ € E, Primed(¢) denotes the “primed” expression ¢ where each 
private input variable x; is replaced by 2% (i.e., its primed version). 

Consider two symbolic executions t = [p,ao,True| 4 (a,¢) : e and t = 
[p, ao, True] |) (a’, ¢’) : e’. Assume if x then p’ else p” is not used with any secret- 
dependent conditions. Recall that we assumed zx is used only in if æ then p’ else p”. 


Then, t and t execute the same subsequence (say p1,--: , Pm) of the statements 
that are if x then p’ else p”. Let e1,--- ,@m (resp. e4,--- , el) be symbolic values 
of x when executing p,,--- , Pm in the symbolic execution t (resp. t). Define the 


constraint W,,(t, t’) as 


V(t, t) = (Q A Primed(¢’) A e = Primed(e’)) = (A ei = Primed(e’)) 


i=1 


Intuitively, W(t, t’) asserts that for every pair of states o,o’ € Stateg if o 
(resp. a’ ) satisfies the path condition ¢ (resp. ¢’), a(e) and o’(e’) are identical, 
then for each 1 < i < m, the values of x are the same when executing the 
conditional statement p; in both RW% (t) and RW5,>/(t’). 


Proposition 5. For each pair of states o,o’ € Leak? (v, Vp) such that o —& 
he =v and o' = p're! = v, if Ue(t,t/) is valid and RWẸS (t) and RWS: (t) are 
identical, then RWE, (t) and RW? ,(t') are identical, where o' = o[x +> Pub]. 


o 


Recall that x can be declassified in a perfect security policy o if 0’ = oļxz > 
Pub] is still perfect, namely, (Pø, o’) YY” oy : v and (py,o) YY” o1 : v are 
identical for every adversary-chosen private inputs Vy € D*, result v € D, and 
states o,o’ € Leak? (v, Vk). By Proposition 5, if Y(t, t’) is valid for each pair of 
symbolic executions t,t’ € SymExe, we can deduce that 0’ is still perfect. 


Theorem 1. If o -, f(X) and ¥,(t,t’) is valid for each pair of symbolic execu- 
tions t,t’ € SymExe, then oļx +> Pub] =p f(x). 


Example 1. Consider two symbolic executions t and t’ in the motivating example 
such that the path condition ¢ (resp. ¢’) of t (resp. t’) is a > b ^c > a (resp. 
a<bAc>b), and both return the result 3. The secret branching variable c2 
has the symbolic values c > a (resp. c > b) in ¢ and t’, respectively. Then 


Volt, t) £ (a >b^Ac>arna <0 Ac Sh A3 = 3) > ((c >a) = (c >b')). 


Obviously, Yo(t, t’) is valid. We can show that for any other pair (t, t’) of sym- 
bolic executions, W(t, t’) is always valid. Therefore, the secret branching vari- 
able c2 can be declassified in any perfect security policy o. 

In contrast, the secret branching variable c1 has the symbolic value a < b in 
both t and t. Then, 


Valt, t) £ (a>b^Ac>arna <b Ac >b A3 = 3) > ((a< b) = (a' < b')). 


W.1(t, t’) is not valid, thus the secret branching variable c1 cannot be declassified. 
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Fig. 5. The workflow of our tool PoS4MPC 


Refinement. Theorem 1 allows us to check if the secret branching variable x 
of a conditional statement if x then p’ else p” that does not used with any 
secret-dependent conditions can be declassified. After that, if z can be declas- 
sified without compromising the security, we feed back the result to the type 
system before checking the next secret branching variable. This allows us to 
refine the security level of variables that are updated in branches, namely, the 
type inference rule T-IF is refined to the following one. 


c = (can « be declassified ? Pub : oọ(x)) 
cUc' H pi:o= o cUc H piesa o = 01 02 


T-IF 
cl if x then pı else po: o > o' | 


7 Implementation and Evaluation 


We have implemented our approach in a tool, named PoS4MPC. The workflow 
of PoS4MPC is shown in Fig.5, The input is an MPC program in C, which 
is parsed to an intermediate representation (IR) inside the LLVM Compiler [1] 
where call graph and control flow graphs are constructed at the LLVM IR level. 
We then perform the type inference which computes the a perfect security pol- 
icy for the given program. To be accurate, we perform a field-sensitive pointer 
analysis [6] and our type inference is also field-sensitive. As the next step, we 
leverage the KLEE symbolic execution engine [10] to explore all the feasible sym- 
bolic executions, as well as the symbolic values of the return variable and secret 
branching variables of each symbolic execution. We fully explore loops since the 
bounds of loops in MPC are public and decided by user-specified inputs. Based 
on them, we iteratively check if a secret branching variable is degraded and the 
result is fed back to the type inference to refine security levels before checking 
the next secret branching variable. After that, we transform the program into 
the input of Obliv-C [44] by which the program can be compiled into executable 
implementations, one for each party. Obliv-C is an extension of C for imple- 
menting 2-party MPC applications using Yao’s garbled circuit protocol [43]. For 
experimental purposes, PoS4MPC also features the high-level MPC framework 
MPyC [37], which is a Python package for implementing n-party MPC appli- 
cations (n > 1) using Shamir’s secret sharing protocol [38]. The C program is 
transformed into Python by a translator. 

We also implement an optimization in our tool to alleviate the path explo- 
sion problem. Instead of directly checking the validity of W,(t, t’) for each secret 
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Table 2. Number of (secret) branching variables 


#Secret branch var #Other secret var 
Name LOC | #Branch var | #Other var 
After TS | After Check | Before refinement | After refinement 
Qs 56 4 6 3 0 4 2 
LinS | 25 1 1 0 2 1 
BinS | 46 2 8 2 1 6 6 
AlmS | 73 6 10 6 4 8 8 
PSI 34 1 5 1 0 3 1 


branching variable x and pair of symbolic executions t and t’, we first check if 
the premise ¢ A Primed(¢’) A e = Primed(e’) of W,(t,t’) is satisfiable. We can 
conclude that W,(t,¢’) is valid for any secret branching variable x if the premise 
oA Primed(¢’) Ae = Primed(e’) is unsatisfiable. Furthermore, this yields a sound 
compositional reasoning approach which allows to split a program into a sequence 
of function calls. When each pair of the symbolic executions for each function 
cannot result in the same return value, we can conclude that W,(t,t’) is valid 
for any secret branching variable x and any pair of symbolic executions t and 
t’ of the entire program. This optimization reduces the evaluation time of sym- 
bolic execution of PSI (resp. QS) from 95.9s-8.1h (resp. 504.6s) to 1.7 s-79.6s 
(resp. 11.6) in input array size varies from 10 to 100 (resp. 10). 


7.1 Evaluation Setup 


For an evaluation of our approach, we conduct experiments on five typical 2- 
party MPC applications [2], i.e., quicksort (QS) [21], linear search (LinS) [13], 
binary search (BinS) [13], almost search (AlmS), and private set intersection 
(PSI) [5]. QS outputs the list of indices of a given integer array a in its ordered 
version, where the first half of a is given by one party and the second half of a 
is given by the another party. LinS (resp. BinS and AlmS) outputs the index of 
an integer b in an array a if it exists, —1 otherwise, where the integer array a is 
the input from one party and the integer b is the input from the another party. 
LinS always scans the array from the start to the end even though it has found 
the integer b. BinS is a standard iterative approach on a sorted array, where the 
array index is protected via oblivious read access machine [20]. AlmS is a variant 
of BinS, where the input array is almost sorted, namely, each element is at either 
the correct position or the closest neighbour of the correct position. PSI outputs 
the intersection of two integer sets, each of which is an input from one party. 
All the experiments were conducted on a desktop with 64-bit Linux Mint 
20.1, Intel Core i5-6300HQ CPU, 2.30GHz and 8 GB RAM. When evaluating 
MPC applications, the client of each party is executed with a single thread. 


7.2 Performance of Security Policy Synthesis 


Security Policy. The results of our approach is shown in Table 2, where column 
(LOC) shows the number of lines of code, column (#Branch var) shows the 
number of branching variables while column (#Other var) shows the number 
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Table 3. Execution time of our security policy synthesis approach 


Length 

Name 10 l 20 30 40 50 60 70 l 80 90 100 
SE Check] SE Check| SE Check| SE Check| SE Check) SE Check) SE Check| SE Check| SE Check| SE Check 
QS (116 0.8 [04h 304.2 |2.0h 959.8 |5.0h 0.6h |9.5h 0.9h |15.5h 13h |22.6h 16h |31.0h 20h |40.7h_2.3h |51.6h 2.7h | 
LinS | 04 10 |06 10 |10 10 |14 10 |20 11 |26 11 |34 12 | 42 12 | 52 13 |62 14 | 
BinS | 0.8 11 |21 43 |38 10.2 |64 200/95 348/138 546 |195 80.1 | 25.6 103.4|341 151.4 | 42.7 204.7 | 
AlmS| 1.3 08 |43 35 |77 10.0 |141 186 |20.6 32.3 | 28.9 51.0 | 40.7 77.4 |55.1 110.3 | 74.9 148.2 | 94.4 200.0 
PSI |17 05 |43 10 |80 15 [13.2 21 [20.0 28 |286 3.5 |393 43 [509 53 |63.0 64 | 796 7.8 


of other variables, columns (After TS) and (After Check) respectively show the 
number of secret branching variables after applying the type system and checking 
if the secret branching variables can be declassified, columns (Before refinement) 
and (After refinement) respectively show the number of other secret variables 
before and after refining the type inference by feeding back the results of the 
symbolic reasoning. (Note that the input variables are excluded in counting.) 

We can observe that only few variables (2 for QS, 1 for LinS, 2 for Bins, 
2 for AlmS and 2 for PSI) can be found to be public by solely using the type 
system. With our symbolic reasoning approach, more secret branching variables 
can be declassified without compromising the security (3 for QS, 1 for LinS, 1 for 
BinS, 2 for AlmS and 1 for PSI). After refining the type inference using results 
of the symbolic reasoning approach, more secret variables can be declassified (2 
for QS, 1 for LinS and 2 for PSI). Overall, our approach annotates 2, 1, 7, 12 and 
1 internal variables as secret out of 10, 4, 10, 16 and 6 variables for QS, Lins, 
BinS, AlmS and PSI, respectively. 


Execution Time. The execution time of our approach is shown in Table3, 
where columns (SE) and (Check) respectively show the execution time (in second 
unless indicated by h for hour) of collecting symbolic executions and checking if 
secret branching variables can be declassified, by varying the size of the input 
array for each program from 10 to 100 with step 10. We did not report the 
execution time of our type system, as it is less than 0.1s for each benchmark. 
We can observe that our symbolic reasoning approach is able to check all 
the secret branching variables in few minutes (up to 294.4s) except for QS. 
After an in-depth analysis, we found that the number of symbolic executions 
is exponential in the length of the input array for QS and PSI while it is linear 
in the length of the input array for the other benchmarks. Our compositional 
reasoning approach works very well on PSI, otherwise it would take similar exe- 
cution time as on QS. Indeed, a loop of PSI is implemented as a sequence of 
function calls each of which has a fixed number of symbolic executions. Further- 
more, each pair of symbolic executions in the called function cannot result in 
the same return value. Therefore, the number of symbolic executions and the 
execution time of our symbolic reasoning approach is reduced significantly. How- 
ever, our compositional reasoning approach does not work on QS. Although the 
number of symbolic executions grows exponentially on QS, the execution time of 
checking if secret branching variables can be declassified is still reduced by our 
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Fig. 7. Execution time (Time) in second using MPyC 


optimization, which avoids the checking of the constraint Y, (t, t’) if its premise 
@ ^ Primed(¢’) A e = Primed(e’) is unsatisfiable. 


7.3 Performance Improvement of MPC Applications 


To evaluate the performance improvement of the MPC applications, we compare 
the execution time (in second), the size of the circuits (in 10°xgates), and the 
volume of communication traffic (in MB) of each benchmark with the security 
policies v1 and v2, where v1 is obtained by solely applying our type system and 
v2 is obtained from vl by degrading security levels and refinement without com- 
promising the security. The measurement results are calculated by sut of vt — 1, 


result of v2 
taking the average of 10 times repetitions in order to minimize the noise. 


Obliv-C. The results in Obliv-C are depicted in Fig. 6 (note the logarithmic scale 
of the vertical coordinate), where the size of the random input array for each 
benchmark varies from 10 to 100 with step size 10. Overall, we can observe that 
the performance improvement is significant especially on QS. In detail, compared 
with the security policy vl on QS (resp. LinS, BinS, AlmS, and PSI), on average 
the security policy v2 reduces (1) the execution time by 1.56 x 10°% (resp. 45%, 
38%, 31% and 36%), (2) the size of circuits by 3.61 x 105% (resp. 368%, 52%, 38% 
and 275%), and (3) the volume of communication traffic by 4.17 x 10°% (resp. 
367%, 53%, 39% and 274%). This demonstrates the performance improvement 
of the MPC applications in Obliv-C that uses Yao’s garbled circuit protocol. 
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MPyC. The results in MPyC are depicted in Fig. 7. Since MPyC does not 
provide the size of circuits and the volume of communication traffic, we only 
report execution time in Fig. 7. The results show that degrading security levels 
also improves execution time in MPyC that uses Shamir’s secret sharing protocol. 
Compared with the security policy v1 on benchmark QS (resp. LinS, BinS, AlmS, 
and PSI), on average the security policy v2 reduces the execution time by 2.5 x 
104% (resp. 64%, 23%, 17% and 996%). 

We note the difference in improvements of Obliv-C and MPyC. It is because: 
(1) Obliv-C and MPyC use different MPC protocols with varying improvements, 
where Yao’s protocol (Obliv-C) is efficient for Boolean computations while the 
secret-sharing protocol (MPyC) is efficient for arithmetic computations; and (2) 
the proportion of downgrading variables is different where a larger proportion 
of downgrading variables (in particular branching variables with large branches) 
boosts performance more. 


8 Related Work 


MPC Frameworks. Early efforts to MPC frameworks provide high-level lan- 
guages for specifying MPC applications and compilers for translating them into 
executable implementations [8,23,31,32]. For instance, Fairplay complies 2-party 
MPC programs written in a domain-specific language into Yao’s garbled cir- 
cuits [31]. FairplayMP [8] extends Fairplay to multi-party using a modified ver- 
sion of the BMR protocol [7] with a Java interface. The others are aimed at 
improving the efficiency of operations in circuits and size of circuits. Mixed MPC 
protocols were also proposed to improve efficiency [9,26,34], as the efficiency of 
MPC protocols vary in operations. These frameworks explore the implementa- 
tion space of operations in specific MPC protocols (e.g., garbled circuits, secret 
sharing and homomorphic encryption), as well as their conversions. However, all 
these frameworks either entirely compile an MPC program or compile an MPC 
program according to user-annotated secret variables to improve performance 
without formal security guarantees. Our approach improves the performance of 
MPC applications by declassifying secret variables without compromising secu- 
rity, which is orthogonal to the above optimization work. 


Security of MPC Applications. Since MPC applications implemented in 
MPC frameworks are not necessarily secure due to information leakage dur- 
ing execution in the real-world. Therefore, information-flow type systems and 
data-flow analysis have been adopted in the MPC frameworks, e.g., [24,37,44]. 
However, they only consider security verification but not automatic generation 
of security policies as we did in the current paper. Moreover, these approaches 
cannot identify some variables (e.g., c2 in our motivating example) that can 
actually be declassified without compromising security. Kerschbaum [25] pro- 
posed to infer public intermediate values by reasoning about epistemic modal 
logic, with a similar goal to ours for declassifying secret variables. However, it is 
unclear how efficient this approach is, as the performance of their approach was 
not reported [25]. 
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Alternatively, self-composition which reduces the security problem to the 
safety problem on two copies of a program has been adopted by [3], where 
the safety problem can be solved by safety verification tools. However, safety 
verification remains challenging and these approaches often require user anno- 
tations (e.g., procedure contracts and loop invariants) that are non-trivial for 
MPC practitioners. Our work is different from them in: (1) they only use the 
self-composition reduction to verify security instead of automatically generat- 
ing a security policy; (2) they have to check almost all the program variables 
which is computational expensive, while we first apply an efficient type system 
to infer a security policy and then only check if the security branching variables 
in the security policy can be declassified; and (3) we check if security branching 
variables can be declassified by reasoning about pairs of symbolic executions 
which can be seen as a divide-and-conquer approach without annotations, and 
the results can be fed back to the type system to efficiently refine security levels. 
We remark that the self-composition reduction could also be used to check if a 
security branching variable could be declassified. 


Information-Flow Analysis. A rich body of literature has studied verifica- 
tion of information-flow security and noninterference in programs [12], which 
requires that confidential data does not flow to outputs. This is too restrictive 
for programs which allow secret data to flow to some non-secret outputs, e.g., 
MPC applications, therefore the security notion is extended with declassifica- 
tion (a.k.a. delimited release) later [27]. These security problems are verified by 
type systems (e.g. [27]) or self-composition (e.g., [39]) or relational reasoning 
(e.g., [4]). Some of these techniques have been adapted to verify timing side- 
channel security, e.g., [11,30,42]. However, as the usual notions of security in 
these settings do not require reasoning about arbitrary leakage, these techniques 
are not directly applicable to our setting. Different from existing analysis using 
symbolic execution [33], our approach takes advantage of the public outputs of 
MPC programs and regards the public outputs as a part of leakage to avoid false 
positive of the noninterference approach and the quantification of information 
flow. 

Finally, we remark that the leakage model considered in this work is dif- 
ferent from the ones used in power side-channel security [16-19,45] and timing 
side-channel security [11,30,36,42] which leverage side-channel information while 
ours assumes that the adversary is able to observe all the public information dur- 
ing computation. 


9 Conclusion 


We have formalized the leakage of an MPC application which bridge the 
language-level and protocol-level leakages via security policies. Based on the for- 
malization, we have presented an approach to automatically synthesize a security 
policy which can improve the performance of MPC applications while not com- 
promising their privacy. Our approach is essentially a synergistic integration of 


404 Y. Fan et al. 


type inference and symbolic reasoning with security type refinement. We imple- 
mented our approach in a tool PoS4MPC. The experimental results on five 
typical MPC applications confirm that our approach can significantly improve 
the performance of MPC applications. 


References 


1. The LLVM compiler infrastructure. https: //Ilvm.org 

2. The source code of our benchmarks (2022). https://github.com/SPo0S4/PoS4MPC 

3. Almeida, J.B., Barbosa, M., Barthe, G., Pacheco, H., Pereira, V., Portela, B.: 
Enforcing ideal-world leakage bounds in real-world secret sharing MPC frame- 
works. In: CSF, pp. 132-146 (2018) 

4. Amtoft, T., Bandhakavi, S., Banerjee, A.: A logic for information flow in object- 
oriented programs. In: POPL, pp. 91-102 (2006) 

5. Andreea, I.: Private set intersection: past, present and future. In: SECRYPT, pp. 
680-685 (2021) 

6. Balatsouras, G., Smaragdakis, Y.: Structure-sensitive points-to analysis for C and 
C++. In: Rival, X. (ed.) SAS 2016. LNCS, vol. 9837, pp. 84-104. Springer, Hei- 
delberg (2016). https://doi.org/10.1007/978-3-662-53413-7_5 

7. Beaver, D., Micali, S., Rogaway, P.: The round complexity of secure protocols. In: 
STOC, pp. 503-513 (1990) 

8. Ben-David, A., Nisan, N., Pinkas, B.: FairplayMP: a system for secure multi-party 
computation. In: CCS, pp. 257-266 (2008) 

9. Biischer, N., Demmler, D., Katzenbeisser, S., Kretzmer, D., Schneider, T.: HyCC: 
compilation of hybrid protocols for practical secure computation. In: CCS, pp. 
847-861 (2018) 

10. Cadar, C., Dunbar, D., Engler, D.R.: KLEE: unassisted and automatic generation 
of high-coverage tests for complex systems programs. In: OSDI, pp. 209-224 (2008) 

11. Chen, J., Feng, Y., Dillig, I.: Precise detection of side-channel vulnerabilities using 
quantitative cartesian hoare logic. In: CCS, pp. 875-890 (2017) 

12. Denning, D.E., Denning, P.J.: Certification of programs for secure information 
flow. Commun. ACM 20(7), 504-513 (1977) 

13. Doerner, J.: The absentminded crypto kit. https://bitbucket.org/jackdoerner/ 
absentminded-crypto-kit / 

14. Evans, D., Kolesnikov, V., Rosulek, M.: A pragmatic introduction to secure multi- 
party computation. Found. Trends Priv. Secur. 2(2-3), 70-246 (2018) 

15. Fan, Y., Song, F., Chen, T., Zhang, L., Liu, W.: Pos4mpc: automated secu- 
rity policy synthesis for secure multi-party computation. Technical report, 
ShanghaiTech University (2022). https: //faculty.sist.shanghaitech.edu.cn/faculty / 
songfu/publications/CAV22full.pdf 

16. Gao, P., Xie, H., Song, F., Chen, T.: A hybrid approach to formal verification 
of higher-order masked arithmetic programs. ACM Trans. Softw. Eng. Methodol. 
30(3), 26:1-26:42 (2021) 

17. Gao, P., Xie, H., Sun, P., Zhang, J., Song, F., Chen, T.: Formal verification of 
masking countermeasures for arithmetic programs. IEEE Trans. Softw. Eng. 48(3), 
973-1000 (2022) 

18. Gao, P., Xie, H., Zhang, J., Song, F., Chen, T.: Quantitative verification of masked 
arithmetic programs against side-channel attacks. In: Vojnar, T., Zhang, L. (eds.) 
TACAS 2019. LNCS, vol. 11427, pp. 155-173. Springer, Cham (2019). https: //doi. 
org/10.1007/978-3-030-17462-0_9 


19. 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


27. 


28. 
29. 


30. 


31. 


32. 


33. 


34. 


35. 


36. 
37. 
38. 
39. 


40. 


PoS4MPC: Automated Security Policy Synthesis 405 


Gao, P., Zhang, J., Song, F., Wang, C.: Verifying and quantifying side-channel 
resistance of masked software implementations. ACM Trans. Softw. Eng. Methodol. 
28(3), 16:1-16:32 (2019) 

Goldreich, O., Ostrovsky, R.: Software protection and simulation on oblivious 
RAMs. J. ACM 43(3), 431-473 (1996) 

Hamada, K., Kikuchi, R., Ikarashi, D., Chida, K., Takahashi, K.: Practically effi- 
cient multi-party sorting protocols from comparison sort algorithms. In: ICISC, 
vol. 7839, pp. 202-216 (2012) 

Hemenway, B., Lu, S., Ostrovsky, R., Welser IV, W.: High-precision secure com- 
putation of satellite collision probabilities. In: Zikas, V., De Prisco, R. (eds.) SCN 
2016. LNCS, vol. 9841, pp. 169-187. Springer, Cham (2016). https://doi.org/10. 
1007 /978-3-319-44618-9_9 

Holzer, A., Franz, M., Katzenbeisser, S., Veith, H.: Secure two-party computations 
in ANSI C. In: CCS, pp. 772-783 (2012) 

Keller, M.: MP-SPDZ: A versatile framework for multi-party computation. In: 
CCS, pp. 1575-1590 (2020) 

Kerschbaum, F.: Automatically optimizing secure computation. In: CCS, pp. 703- 
714 (2011) 

Laud, P., Randmets, J.: A domain-specific language for low-level secure multiparty 
computation protocols. In: CCS, pp. 1492-1503 (2015) 

Li, P., Zdancewic, S.: Downgrading policies and relaxed noninterference. In: POPL, 
pp. 158-170 (2005) 

Lindell, Y.: Secure multiparty computation. Commun. ACM 64(1), 86-96 (2021) 
Liu, C., Wang, X.S., Nayak, K., Huang, Y., Shi, E.: ObliVM: a programming 
framework for secure computation. In: S&P, pp. 359-376 (2015) 

Malacaria, P., Khouzani, M.H.R., Pasareanu, C.S., Phan, Q., Luckow, K.S.: Sym- 
bolic side-channel analysis for probabilistic programs. In: CSF, pp. 313-327 (2018) 
Malkhi, D., Nisan, N., Pinkas, B., Sella, Y.: Fairplay - secure two-party computa- 
tion system. In: USENIX Security Symposium, pp. 287-302 (2004) 

Mood, B., Gupta, D., Carter, H., Butler, K.R.B., Traynor, P.: Frigate: a vali- 
dated, extensible, and efficient compiler and interpreter for secure computation. 
In: EuroS&P, pp. 112-127 (2016) 

Pasareanu, C.S., Kersten, R., Luckow, K.S., Phan, Q.: Chapter six - symbolic 
execution and recent applications to worst-case execution, load testing, and security 
analysis. Adv. Comput. 113, 289-314 (2019) 

Patra, A., Schneider, T., Suresh, A., Yalame, H.: ABY2.0: improved mixed-protocol 
secure two-party computation. In: USENIX Security Symposium, pp. 2165-2182 
(2021) 

Poddar, R., Kalra, S., Yanai, A., Deng, R., Popa, R.A., Hellerstein, J.M.: Senate: a 
maliciously-secure MPC platform for collaborative analytics. In: USENIX Security 
Symposium, pp. 2129-2146 (2021) 

Qin, Q., JiYang, J., Song, F., Chen, T., Xing, X.: Preventing timing side-channels 
via security-aware just-in-time compilation. CoRR abs/2202.13134 (2022) 
Schoenmakers, B.: MPyC: secure multiparty computation in Python (2020). 
https: //github.com/Ischoe/mpyc 

Shamir, A.: How to share a secret. Commun. ACM 22(11), 612-613 (1979) 
Terauchi, T., Aiken, A.: Secure information flow as a safety problem. In: Hankin, 
C., Siveroni, I. (eds.) SAS 2005. LNCS, vol. 3672, pp. 352-367. Springer, Heidelberg 
(2005). https: //doi.org/10.1007/11547662_24 

Volpano, D.M., Irvine, C.E., Smith, G.: A sound type system for secure flow anal- 
ysis. J. Comput. Secur. 4(2/3), 167-188 (1996) 


406 


41. 


42. 


43. 
44. 


45. 


Y. Fan et al. 


Wagh, S., Gupta, D., Chandran, N.: SecureNN: efficient and private neural network 
training. IACR Cryptology ePrint Archive, p. 442 (2018) 

Yang, W., Vizel, Y., Subramanyan, P., Gupta, A., Malik, S.: Lazy self-composition 
for security verification. In: Chockler, H., Weissenbacher, G. (eds.) CAV 2018. 
LNCS, vol. 10982, pp. 136-156. Springer, Cham (2018). https://doi.org/10.1007/ 
978-3-319-96142-2_11 

Yao, A.C.: Protocols for secure computations. In: FOCS, pp. 160-164 (1982) 
Zahur, S., Evans, D.: Obliv-C: a language for extensible data-oblivious computa- 
tion. IACR Cryptology ePrint Archive, p. 1153 (2015) 

Zhang, J., Gao, P., Song, F., Wang, C.: SCINFER: refinement-based verification 
of software countermeasures against side-channel attacks. In: Chockler, H., Weis- 
senbacher, G. (eds.) CAV 2018. LNCS, vol. 10982, pp. 157-177. Springer, Cham 
(2018). https://doi.org/10.1007/978-3-319-96142-2_12 


Open Access This chapter is licensed under the terms of the Creative Commons 
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), 
which permits use, sharing, adaptation, distribution and reproduction in any medium 
or format, as long as you give appropriate credit to the original author(s) and the 
source, provide a link to the Creative Commons license and indicate if changes were 
made. 


The images or other third party material in this chapter are included in the 


chapter’s Creative Commons license, unless indicated otherwise in a credit line to the 
material. If material is not included in the chapter’s Creative Commons license and 
your intended use is not permitted by statutory regulation or exceeds the permitted 


use, 


you will need to obtain permission directly from the copyright holder. 


® 


Check for 
updates 


Explaining Hyperproperty Violations 


Norine Coenen!“®)@, Raimund Dachselt?@®, Bernd Finkbeiner!®, 
Hadar Frenkel!®, Christopher Hahn'®, Tom Horak?@®, Niklas Metzger'®, 
and Julian Siber!® 


1 CISPA Helmholtz Center for Information Security, Saarbrücken, Germany 
{norine .coenen,finkbeiner,hadar.frenkel,christopher.hahn, 
niklas.metzger, julian.siber}@cispa.de 
2 Interactive Media Lab, Technische Universitat Dresden, Dresden, Germany 
dachselt@acm. org 


CAV 3 elevait GmbH & Co. KG, Dresden, Germany CAV 
Artifact Artifact 
Evaluation Evaluation 


* 
Available 


Abstract. Hyperproperties relate multiple computation traces to each 
other. Model checkers for hyperproperties thus return, in case a system 
model violates the specification, a set of traces as a counterexample. 
Fixing the erroneous relations between traces in the system that led 
to the counterexample is a difficult manual effort that highly benefits 
from additional explanations. In this paper, we present an explanation 
method for counterexamples to hyperproperties described in the spec- 
ification logic HyperLTL. We extend Halpern and Pearl’s definition of 
actual causality to sets of traces witnessing the violation of a HyperLTL 
formula, which allows us to identify the events that caused the violation. 
We report on the implementation of our method and show that it signif- 
icantly improves on previous approaches for analyzing counterexamples 
returned by HyperLTL model checkers. 
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1 Introduction 


While model checking algorithms and tools (e.g., [12,17,18,26,47,55]) have, in 
the past, focused on trace properties, recent failures in security-critical systems, 
such as Heartbleed [28], Meltdown [59], Spectre [52], or Log4j [1], have triggered 
the development of model checking algorithms for properties that relate multiple 
computation traces to each other, i.e., hyperproperties [21]. Although the coun- 
terexample returned by such a model checker for hyperproperties, which takes 
the shape of a set of traces, may aid in the debugging process, understanding 
and narrowing down which features are actually responsible for the erroneous 
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relation between the traces in the counterexample requires significantly more 
manual effort than for trace properties. In this paper, we develop an explana- 
tion technique for these more complex counterexamples that identifies the actual 
causes [44-46] of hyperproperty violations. 

Existing hyperproperty model checking approaches (e.g., [33,35,49]), take a 
HyperLTL formula as an input. HyperLTL is a temporal logic extending LTL 
with explicit trace quantification [20]. For example, observational determinism, 
which requires that all traces 7,7’ agree on their observable outputs lo when- 
ever they agree on their observable inputs li, can be formalized in HyperLTL 
as Vir.Va'.Ollia e liw) > Olor > low). In case a system model violates 
observational determinism, the model checker consequently returns a set of two 
execution traces witnessing the violation. 

A first attempt in explaining model checking results of HyperLTL specifi- 
cations has been made with HyperVis [48], which visualizes a counterexample 
returned by the model checker MCHyper [35] in a browser application. While 
the visualizations are already useful to analyze the counterexample at hand, it 
fails to identify causes for the violation in several security-critical scenarios. This 
is because HyperVis identifies important atomic propositions that appear in the 
HyperLTL formula and highlights these in the trace and the formula. For detect- 
ing causes, however, this is insufficient: a cause for a violation of observational 
determinism, for example, could be a branch on the valuation of a secret input 
is, which is not even part of the formula (see Sect.3 for a running example). 

Defining what constitutes an actual cause for an effect (a violation) in a 
given scenario is a precious contribution by Halpern and Pearl [44-46], who 
refined and formalized earlier approaches based on counterfactual reasoning [58]: 
Causes are sets of events such that, in the counterfactual world where they do 
not appear, the effect does not occur either. One of the main insights of Halpern 
and Pearl’s work, however, is that naive counterfactuals are too imprecise. If, for 
instance, our actual cause preempted another potential cause, the mere absence 
of the actual cause will not be enough to prevent the effect, which will be still 
produced by the other cause in the new scenario. Halpern and Pearl’s definition 
therefore allows to carefully control for other possible causes through the notion 
of contingencies. In the modified definition [44], contingencies allow to fix certain 
features of the counterfactual world to be exactly as they are in the actual world, 
regardless of the system at hand. Such a contingency effectively modifies the 
dynamics of the underlying model, and one insight of our work is that defining 
actual causality for reactive systems also needs to modify the system under 
a contingency. Notably, most works regarding trace causality [13,39] do not 
consider contingencies but only counterfactuals, and thus are not able to find 
true actual causes. 

In this paper, we develop the notion of actual causality for effects described 
by HyperLTL formulas and use the generated causes as explanations for coun- 
terexamples returned by a model checker. We show that an implementation of 
our algorithm is practically feasible and significantly increases the state-of-the- 
art in explaining and analyzing HyperLTL model checking results. 
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2 Preliminaries 


We model a system as a Moore machine [62] T = (S,50,AP,06,1) where S is 
a finite set of states, sọ € S is the initial state, AP = I UO is the set of 
atomic propositions consisting of inputs J and outputs O, ô: S x 2! — S is the 
transition function determining the successor state for a given state and set of 
inputs, and 1: S — 2° is the labeling function mapping each state to a set of 
outputs. A trace t = totyto... € Cae a of T is an infinite sequence of sets of 
atomic propositions with t; = A U (si), where A C I and 6(s;, A) = si+ı for 
all i > 0. We usually write t[n] to refer to the set tn at the (n + 1)-th position 
of t. With traces(T), we denote the set of all traces of T. For some sequence of 
inputs a = agaz... € (2/)”, the trace T(a) is defined by T(a); = a; Ul(s;) 
and 6(s;,@;) = 8:41 for all i > 0. A trace property P C T is a set of traces. A 
hyperproperty H is a lifting of a trace property, i.e., a set of sets of traces. A 
model T satisfies a hyperproperty H if the set of traces of T is an element of the 
hyperproperty, i.e., traces(T) € H. 


2.1 HyperLTL 


HyperLTL is a recently introduced logic for expressing temporal hyperproperties, 
extending linear-time temporal logic (LTL) [64] with trace quantification: 


p = Yr. | Ir.y | Y 
Y = ar |b | Yay | OY | puy 


We also consider the usual derived Boolean (V, >, +>) and temporal operators 
(YRY = 7(7~U-7w), Ov = trueUy, OY = falseRy). The semantics of Hyper- 
LTL formulas is defined with respect to a set of traces Tr and a trace assignment 
IT: V — Tr that maps trace variables to traces. To update the trace assignment 
so that it maps trace variable 7 to trace t, we write [1 + t]. 


I, i FT, an iff a € H(r)[i] 

IT, i Eo, 7p iff Tit o 

H, iEn pAw iff IIi Fm yp and H, i Er, Y 

T,iFtr Oy iff H i+ Erm 

I, iF pum iff >i jen YAY k< j H, k Er p 

I,i Fr, dn.y iff there is some t € Tr such that H[r => t], i E mr Y 
IMH, iE Yr.p iff for all t € Tr it holds that H[r > t], i Etr Y 


We explain counterexamples found by MCHYPER [24,35], which is a model 
checker for HyperLTL formulas, building on ABC [12]. MCHYPER takes as 
inputs a hardware circuit, specified in the AIGER format [8], and a Hyper- 
LTL formula. MCHYPER solves the model checking problem by computing the 
self-composition [6] of the system. If the system violates the HyperLTL for- 
mula, MCHYPER returns a counterexample. This counterexample is a set of 
traces through the original system that together violate the HyperLTL formula. 
Depending on the type of violation, this counterexample can then be used to 
debug the circuit or refine the specification iteratively. 
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2.2 Actual Causality 


A formal definition of what actually causes an observed effect in a given context 
has been proposed by Halpern and Pearl [45]. Here, we outline the version later 
modified by Halpern [44]. Causality is defined with respect to a causal model 
M = (S,F), given by a signature S and set of structural equations F, which 
define the dynamics of the system. A signature S is a tuple (U, V, D), where U 
and V are disjoint sets of variables, termed exogenous and endogenous variables, 
respectively; and D defines the range of possible values D(Y) for all variables 
Y €UUY.A context ü is an assignment to the variables in U U VY such that 
the values of the exogenous variables are determined by factors outside of the 
model, while the value of some endogenous variable X is defined by the associated 
structural equation fx € F. An effect y in a causal model is a Boolean formula 
over assignments to endogenous variables. We say that a context u ui of a model M 
satisfies a partial variable assignment X = for X CUUY if the assignments 
in @ and in 7 coincide for every variable X € X. The extension for Boolean 
formulas over variable assignments is as expected. For a context @ and a partial 
variable assignment X = Z, we denote by (M, u) |X — T] the context 7’ in which 
the values of the variables in X are set according to 7, and all other values are 
computed according to the structural equations. 

The actual causality framework of Halpern and Pearl aims at defining what 
events (given as variable assignments) are the cause for the occurrence of an 
effect in a specific given context. We now provide the formal definition. 


Definition 1 ((44,45]). A partial variable assignment X = Z is an actual cause 
of the effect p in (M ,U) if the following three conditions hold. 


= 


AC1: (M,t) E X = # and (M, U) E ọ, i.e., both cause and effect are true in the 
actual world. 

AC2: There is a set W CV of endogenous variables and an se es Z to the 
variables in X s.t. if (M,a) E W =a, then (M,@)[X —2#,W — Ww] E ~y. 

AC3: X is minimal, i.e. no subset of X satisfies ACI and ACs 


Intuitively, AC2 states that in the counterfactual world obtained by interven- 
ing on the cause X = Z in the actual world (that is, setting the variables in X to 
Z’), the effect does not appear either. However, intervening on the possible cause 
might not be enough, for example when that cause preempted another. After 
intervention, this other cause may produce the effect again, therefore clouding 
the effect of the intervention. To address this problem, AC2 allows to reset values 
through the notion of contingencies, i.e., the set of variables W can be reset to 
w, which is (implicitly) universally quantified. However, since the actual world 
has to model W = w, it is in fact uniquely determined. AC3, lastly, enforces 
the cause to be minimal by requiring that all variables in X are strictly neces- 
sary to achieve AC1 and AC2. For an illustration of Halpern and Pearl’s actual 
causality, see Example 1 in Sect. 3. 
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3 Running Example 


Consider a security-critical setting with two security levels: a high-security level 
h and a low-security level l. Inputs and outputs labeled as high-security, denoted 
by hi and ho respectively, are confidential and thus only visible to the user itself, 
or, e.g., admins. Inputs and outputs labeled as low-security, denoted by li and 
lo respectively, are public and are considered to be observable by an attacker. 

Our system of interest is modeled by the 
state graph representation shown in Fig.1, 
which is treated as a black box by an attacker. 
The system is run without any low-security 
inputs, but branches depending on the given 
high-security inputs. If in one of the first two 
steps of an execution, a high-security input hi is 
encountered, the system outputs only the high- 
security variable ho directly afterwards and in 
the subsequent steps both outputs, regardless of 
inputs. If no high-security input is given in the 
first step, the low-security output lo is enabled 
and after the second step, again both outputs Fig.1. State graph representa- 
are enabled, regardless of what input is fed into tion of our example system. 
the system. 

A prominent example hyperproperty is observational determinism from the 
introduction which states that any sequence of low-inputs always produces 
the same low-outputs, regardless of what the high-security level inputs are. 
p=VarNr' Ollie e lix) > Olor > log). The formula states that all traces m 
and 7’ must agree in the low-security outputs if they agree in the low-security 
inputs. Our system at hand does not satisfy observational determinism, because 
the low-security outputs in the first two steps depend on the present high-security 
inputs. Running MCHyper, a model checker for HyperLTL, results in the follow- 
ing counterexample: tı = {Hlo Hho, lo}” and tz = {hi}{hi, ho}{ho, lo}”. With 
the same low-security input (none) the traces produce different low-security out- 
puts by visiting sı or s2 on the way to s3. 

In this paper, our goal is to explain the violation of a HyperLTL formula 
on such a counterexample. Following Halpern and Pearl’s explanation frame- 
work [46], an actual cause that is considered to be possibly true or possibly false 
constitutes an explanation for the user. We only consider causes over input vari- 
ables, which can be true and false in any model. Hence, finding an explanation 
amounts to answering which inputs caused the violation on a specific counterex- 
ample. Before we answer this question for HyperLTL and the corresponding 
counterexamples given by sets of traces (see Sect. 4), we first illustrate Halpern 
and Pearl’s actual causality (see Sect. 2.2) with the above running example. 


+ 


Example 1. Finite executions of a system can be modeled in Halpern and Pearl’s 
causal models. Consider inputs as exogenous variables U = {hio, hi1} and out- 
puts as endogenous variables V = {lo,lo2,ho1,ho2}. The indices model at 
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which step of the execution the variable appears. We omit the inputs at the 
third position and the outputs at the first position because they are not rel- 
evant for the following exposition. We have that D(Y) = {0,1} for every 
Y e UUV. Now, the following manually constructed structural equations 
encode the transitions: (1) loy = hig, (2) hoy = hio, (3) log = ahi, V nlo: 
and (4) hoz = lo, V hoy. Consider context ù = {hio = 0,hi; = 1}, effect 
yp = loy = 1V lop = 1, and candidate cause hio = 0. Because of (1), we have that 
(M, Ñ) E hig = 0 and (M, it) F lo; = 1, hence AC1 is satisfied. Regarding AC2, 
this example allows us to illustrate the need for contingencies to accurately 
determine the actual cause: If we only consider intervening on the candidate 
cause hig = 0, we still have (M,%)[hig — 1] E y, because with lo; = 0 and 
(3) it follows that (M, u) E log = 1. However, in the actual world, the second 
high input has no influence on the effect. We can control for this by consid- 
ering the contingency log = 0, which is satisfied in the actual world, but not 
after the intervention on hio. Because of this contingency, we then have that 
(M, w)[hio — 1, log — 0] E ~y, and hence, AC2 holds. Because a singleton set 
automatically satisfies AC3, we can infer that the first high input hig was the 
actual cause for any low output to be enabled in the actual world. Note that, 
intuitively, the contingency allows us to ignore some of the structural equations 
by ignoring the value they assign to log in this context. Our definitions in Sect. 4 
will allow similar modifications for counterexamples to hyperproperties. 


4 Causality for Hyperproperty Violations 


Our goal in this section is to formally define actual causality for the violation 
of a hyperproperty described by a general HyperLTL formula y, observed in 
a counterexample to y. Such a counterexample is given by a trace assignment 
to the trace variables appearing in y. Note that, for universal quantifiers, the 
assignment of a single trace to the bounded variable suffices to define a coun- 
terexample. For existential quantifiers, this is not the case: to prove that an 
existential quantifier cannot be instantiated we need to show that no system 
trace satisfies the formula in its body, i.e., provide a proof for the whole sys- 
tem. In this work, we are interested in explaining violations of hyperproperties, 
and not proofs of their satisfaction [16]. Hence, we limit ourselves to instan- 
tiations of the outermost universal quantifiers of a HyperLTL formula, which 
can be returned by model checkers like MCHyper [24,35]. Since our goal is to 
explain counterexamples, restricting ourselves to results returned by existing 
model checkers is reasonable. Note that MCHyper can still handle formulas of 
the form V"S""y where ¢ is quantifier free, including interesting information flow 
policies like generalized noninterference [61]. The returned counterexample then 
only contains n traces that instantiate the universal quantifiers, the existential 
quantifiers are not instantiated for the above reason. In the following, we restrict 
ourselves to formulas and counterexamples of this form. 


Definition 2 (Counterexample). Let T be a transition system and denote 
Traces(T) := Tr, and let p be a HyperLTL formula of the form Yra... Yrgo, 


Explaining Hyperproperty Violations 413 


where w is a HyperLTL formula that does not start with Y. A counterexample to p 
in T is a partial trace assignment I : {77,...,7%} —> Tr such that T, 0 Fx, 7. 


For ease of notation, we sometimes refer to I’ simply as the tuple of its 
instantiations [ = (I'(7),...,1£(a,)). In terms of Halpern and Pearl’s actual 
causality as outlined in Sect. 2.2, a counterexample describes the actual world at 
hand, which we want to explain. As a next step, we need to define an appropriate 
language to reason about possible causes and contingencies in our counterexam- 
ple. We will use sets of events, i.e., values of atomic propositions at a specific 
position of a specific trace in the counterexample. 


Definition 3 (Event). An event is a tuple e = (la,n,t) such that la = a or 
la = na for some atomic proposition a E€ AP, n € N is a point in time, and 
t € (24°) is a trace of a system T. We say that a countererample I = (ti, ... tk) 
satisfies a set of events C, and denote FC, if for every event (la,n,t) EC the 
two following conditions hold: 


1. t= t; for some i € {1,...,k}, i.e., all events in C reason about traces in I, 
2. la =a iff a € tiln], i.e., a holds on trace ti of the counterexample at time n. 


We assume that the set AP is a disjoint union of input an output propositions, 
that is, AP = IUO. We say that (la, n, t) is an input event if a € I, and we call 
it an output event if a € O. We denote the set of input events by JE and the 
set of output events by OF. These events have a direct correspondence with the 
variables appearing in Halpern and Pearl’s causal models: we can identify input 
events with exogenous variables (because their value is determined by factors 
outside of the system) and output events with endogenous variables. 

We define a cause as a set of input events, while an effect is a possibly infinite 
Boolean formula over OF. Note that, similar to [37], every HyperLTL formula 
can be represented as a first order formula over events, e.g. VtVa' (ag e Gq’) = 
Var’ Nnenlla n,n)  (a,n,7’)). For some set of events S, let tS = {a € 
AP | (a,k,7) € S} denote the set of atomic propositions defined positively by 
S on trace 7 at position k. Dualy, we define TSE = {a € AP | (~a, k, T) € S}. 

In order to define actual causality for hyperproperties we need to formally 
define how we obtain the counterfactual executions under some contingency 
for the case of events on infinite traces. We define a contingency as a set of 
output events. Mapping Halpern and Pearl’s definition to transition systems, 
contingencies reset outputs in the counterfactual traces back to their value in the 
original counterexample, which amounts to changing the state of the system, and 
then following the transition function from the new state. For a given trace of the 
counterexample, we describe all possible behaviors under arbitrary contingencies 
with the help of a counterfactual automaton. The concrete contingency on a trace 
is defined by additional input variables. In the following, let JC = {0° | o € O} 
be a set of auxiliary input variables expressing whether a contingency is invoked 


at the given step of the execution and c: O — IC be a function s.t. c(o) = 0%. 


Definition 4 (Counterfactual Automaton). Let T = (5,50, AP,6,l) be a 
system with S = 2°, i.e., every state is uniquely labeled, and there exists a state 
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for every combination of outputs. Let T = 19... 1; (1; ... nn)” € traces(T) be a 
trace of T in a finite, lasso-shaped representation. The counterfactual automaton 
To = (Sx {0...n}, (89,0), ICUNDU(OU{0...n}), 69,19) is defined as follows: 


- 6° ((s,k), Y) =(s',k') where k! = j if k =n, else k! =k +1, and 
(s) = {0 €O | (0 € 0(8s, YNTI) Ac(o) GY) V (0 € me Ac(o) € Y)}, 
- (C(s, k) = U(s) U {k}. 


A counterfactual automaton is effectively a chain of copies of the original 
system, of the same length as the counterexample. An execution through the 
counterfactual automaton starts in the first copy corresponding to the first posi- 
tion in the counterexample trace, and then moves through the chain until it 
eventually loops back from copy n to copy j. A transition in the counterfactual 
automaton can additionally specify setting as a contingency some output vari- 
able o if the auxiliary input variable o? is enabled. In this case, the execution 
will move to a state in the next automaton of the chain where all the outputs 
are as usual, except o, which will have the same value as in the counterexample 
m. Note that, under the assumption that all states of the original system are 
uniquely labeled and there exists a state for every combination of output vari- 
ables, the function 5© is uniquely determined.! A counterfactual automaton for 
our running example is described in the full version of this paper [22]. 

Next, we need to define how we intervene on a set of traces with a candidate 
cause given as a set of input events, and a contingency given as a set of out- 
put events. We define an intervention function, which transforms a trace of our 
original automaton to an input sequence of an counterfactual automaton. 


Definition 5 (Intervention). For a cause C C IE, a contingency W C OE 
and a trace n, the function intervene : (24P)” x 2E x 20E — (2!YIC)\” returns 
a trace such that for all k € N the following holds: intervene(x,C,W)[k] = 
(x[k] \ CE) U CE U {e(o) | o e WE U-WE}. We lift the intervention 
function to countereramples given as a tuple T = (m,...,7%) as follows: 
intervene(I’,C, W) = (TE (intervene(m,C,W)),..., TE (intervene(m,,C, W))). 


Intuitively, the intervention function flips all the events that appear in the 
cause I’: If some a € I appears positively in the candidate cause C, it will appear 
negatively in the resulting input sequence, and vice-versa. For a contingency W, 
the intervention function enables their auxiliary input for the counterfactual 
automaton at the appropriate time point irrespective of their value, as the coun- 
terfactual automaton will take care of matching the atomic propositions value 
to the value in the original counterexample I’. 


1 The same reasoning can be applied to arbitrary systems by considering for contingen- 
cies largest sets of outputs for which the assumption holds, with the caveat that the 
counterfactual automaton may model fewer contingencies. Consequently, computed 
causes may be less precise in case multiple causes appear in the counterexample. 
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4.1 Actual Causality for HyperLTL Violations 


We are now ready to formalize what constitutes an actual cause for the violation 
of a hyperproperty described by a HyperLTL formula. 


Definition 6 (Actual Causality for HyperLTL). Let I be a counterezam- 
ple to a HyperLTL formula p in a system T. The set C is an actual cause for 
the violation of p on T if the following conditions hold. 


SAT TEC. 

CF There exists a contingency W and a non-empty subset C' C C such that: 
DEW and intervene(I,C’,W) Firaces(T) Y 

MIN C is minimal, i.e., no subset of C anes SAT and CF. 


Unlike in Halpern and Pearl’s definition (see Sect.2.2), the condition SAT 
requires I” to satisfy only the cause, as we already know that the effect ~o, 
i.e., the violation of the specification, is satisfied by virtue of I’ being a coun- 
terexample. CF is the counterfactual condition corresponding to AC2 in Halpern 
and Pearl’s definition, and it states that after intervening on the cause, under a 
certain contingency, the set of traces satisfies the property. (Note that we use a 
conjunction of two statements here while Halpern and Pearl use an implication. 
This is because they implicitly quantify universally over the values of the vari- 
ables in the set W (which should be as in the actual world) where in our setting 
the set of contingencies already defines explicit values.) MIN is the minimality 
criterion directly corresponding to AC3. 


Example 2. Consider our running example from Sect. 3, i.e., the system from 
Fig. 1 and the counterexample to observational determinism I = (t1, t2). Let us 
consider what it means to intervene on the cause Cı = {(hi,0,t2)}. Note that 
we have I’ F Cı, hence the condition SAT is satisfied. For CF, let us first con- 
sider an intervention without contingencies. This results in intervene(I,C1, 0) = 
(t,t) = (tı, {}{hi, lo} {ho} {ho, lo}”). However, intervene(I,C1,0) Firaces(1) 
~g, because the low outputs of t} and t4 differ at the third position: lo € t; [2] 
and lo ¢ t4[2]. This is because now the second high input takes effect, which 
was preempted by the first cause in the actual counterexample. The contin- 
gency W2 = {(lo,2,t2))} now allows us to control this by modyfing the state 
after taking the second high input as follows: intervene(I’,C2,W2)) = (t1, t3) = 
(tı, {}{hi, lo Hho, lo}{ho, lo}”). Note that t is not a trace of the model depicted 
in Fig. 1, because there is no transition that explains the step from t4[1] to t4[2]. 
It is, however, a trace of the counterfactual automaton Tf (see full version [22]), 
which encodes the set of counterfactual worlds for the trace tg. The fact that 
we consider executions that are not part of the original system allows us to 
infer that only the first high input was an actual cause in our running exam- 
ple. Disregarding contingencies, we would need to consider both high inputs as 
an explanation for the violation of observational determinism, even though the 
second high input had no influence. Our treatment of contingencies corresponds 
directly to Halpern and Pearl’s causal models, which allow to ignore certain 
structural equations as outlined in Example 1. 
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Remark: With our definitions, we strictly generalize Halpern and Pearl’s actual 
causality to reactive systems modeled as Moore machines and effects expressed as 
HyperLTL formulas. Their structural equation models can be encoded in a one- 
step Moore machine; effect specifying a Boolean combination of primitive events 
can be encoded in the more expressive logic HyperLTL. Just like for Halpern and 
Pearl, our actual causes are not unique. While there can exist several different 
actual causes, the set of all actual causes is always unique. It is also possible 
that no actual cause exists: If the effect occurs on all system traces, there may 
be no actual cause on a given individual trace. 


4.2 Finding Actual Causes with Model Checking 


In this section, we consider the relationship between finding an actual cause for 
the violation of a HyperLTL formula starting with a universal quantifier and 
model checking of HyperLTL. We show that the problem of finding an actual 
cause can be reduced to a model checking problem where the generated formula 
for the model checking problem has one additional quantifier alternation. While 
there might be a reduction resulting in a more efficient encoding, our current 
result suggests that causality checking is the harder problem. The key idea of 
our reduction is to use counterfactual automata (that encode the given coun- 
terexample and the possible counterfactual traces) together with the HyperLTL 
formula described in the proof to ensure the conditions SAT, CF, and MIN on 
the witnesses for the model checking result. 


Proposition 1. We can reduce the problem of finding an actual cause for the 
violation of an HyperLTL formula starting with a universal quantifier to the 
HyperLTL model checking problem with one additional quantifier alternation. 


Proof. Let I’ = (t1,...t,) be a counterexample for the formula V71...V7,.p 
where ọ is a HyperLTL formula that does not have a universal first quantifier. 
We provide the proof for the case of [ = (t,,t2) for readability reasons, but 
it can be extended to any natural number k. We assume that t,,t2 have some 
w-regular representation, as otherwise the initial problem of computing causality 
is not well defined. That is, we denote t; = u;(v;)” such that |u; - vi] = ni. 

In order to find an actual cause, we need to find a pair of traces t4, th that are 
counterfactuals for t,,t2; satisfy the property y; and the changes from tj, tz to 
ti, t3 are minimal with respect to set containment. Changes in inputs between 
ti and t; in the loop part v; should reoccur in t, repeatedly. Note that the 
differences between the counterexample (t;,t2) and the witness of the model 
checking problem (t,t) encode the actual cause, i.e. in case of a difference, 
the cause contains the event that is present on the counterexample. To reason 
about these changes, we use the counterfactual automaton TŪ for each t;, which 
also allows us to search for the contingency W as part of the input sequence 
of TE, Note that each T. G consists of n; copies, that indicate in which step the 
automaton is with respect to t; and its loop v;. For m > |u;|, we label each state 
(sim) in T© with the additional label Ls, ;, to indicate that the system is now 
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in the loop part of t;. In addition, we add to the initial state of T° the label 
li, and we add to the initial state of the system T the label lor. The formula 

loop below states that the trace 7 begins its run from the initial state of Te 
(and thus stays in this component through the whole run), and that every time 
m visits a state on the loop, the same input sequence is observed. This way we 
enforce the periodic input behavior of the traces t1, tz on t4, th. 


Voop (T = Lin A A Vx Lsnin > (f\ ar^ N 7a) 


ACI acA agA 


Lami 


For a subset of locations N C [1,n,] and a subset of input propositions A C I 
we define Piip [N, A](z) that states that m differs from t; in at least all events 
(la, M, ti) for a € A,m € N; and the formula Yt UY, A](7) that states that for 
all events that are not defined by A and N, m is equal to ti. 


Pag lN Alm) = NA lar # a) 


jEN,acAa 


el NAM) = A Plaan A (ar > a) 
j¢N,a€1 jEll,ni], ag A 
We now define the formula Yf in that states that the set of inputs (and 
locations) on which trace m differs from t; is not contained in the corresponding 


set for 7’. We only check locations up until the length n; of tj. 


min(™ T) = A A (Chirag Ns Al) A pial, Al(@)) > -Weg iN, AC’) 


NC[i,n;] ACT 


Denote y := Qi71.--QnTm.- 9! (71,72) where Q; € {V,i} and 7; are trace 
variables for i € [1,n]. The formula Ycause described below states that the two 
traces 7, and mh are part of the systems TF, 7, and have the same loop struc- 
ture as tı and tz, and satisfy y. That is, these traces can be obtained by changing 
the original traces t;,t2 and avoid the violation. 


WPeausel Ti, T2) = g (T173) AN Vioop (T 


1=1,2 


Finally, Yactual described below states that the counterfactuals 7}, 7 corre- 
spond to a minimal change in the input events with respect to tı, t2. All other 
traces that the formula reasons about start at the initial state of the original 
system and thus are not affected by the counterfactual changes. We verify Wactual 
against the product automaton T x TE x TY to find these traces 7 € TC that 
witness the presence of a cause, counterfactual and contingency. 


Wactual = Sar}. Sr. Yri 19 « Qin tee QnTn- Prause Tiz T3) A VAN (lin! A Li ni!) 
i=1,2 


^ \ lorr; A | Peause(™ T3) > \ Vinin Ms T) 


i€[1,n] i=1,2 
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Then, if there exists two such traces 7,74 in the system T x T? x TY, 
they correspond to a minimal cause for the violation. Otherwise, there are no 
traces of the counterfactual automata that can be obtained from tj, tg using 
counterfactual reasoning and satisfy the formula g. 


We have shown that we can use HyperLTL model checking to find an actual 
cause for the violation of a HyperLTL formula. The resulting model checking 
problem has an additional quantifier alternation which suggests that identifying 
actual causes is a harder problem. Therefore, we restrict ourselves to finding 
actual causes for violations of universal HyperLTL formulas. This keeps the 
algorithms we present in the next section practical as we start without any 
quantifier alternation and need to solve a model checking problem with a single 
quantifier alternation. While this restriction excludes some interesting formulas, 
many can be strengthened into this fragment such that we are able to handle close 
approximations (c.f. [25]). Any additional quantifier alternation from the original 
formula carries over to an additional quantifier alternation in the resulting model 
checking problem which in turn leads to an exponential blow-up. The scalability 
of our approach is thus limited by the complexity of the model checking problem. 


5 Computing Causes for Counterexamples 


In this section, we describe our algorithm for finding actual causes of hyperprop- 
erty violations. Our algorithm is implemented on top of MCHyper [35], a model 
checker for hardware circuits and the alternation-free fragment of HyperLTL. In 
case of a violation, our analysis enriches the provided counterexample with the 
actual cause which can explain the reason for the violaiton to the user. 

We first provide an overview of our algorithm and then discuss each step in 
detail. First, we compute an over-approximation of the cause using a satisfiability 
analysis over transitions taken in the counterexample. This analysis results in 
a set of events C. As we show in Proposition 2, every actual cause C for the 
violation is a subset of C. In addition, in Proposition 3 we show that the set 
C satisfies conditions SAT and CF. To ensure MIN, we search for the smallest 
subset C C Č that satisfies SAT and CF. This set C is then our minimal and 
therefore actual cause. 

To check condition CF, we need to check the counterfactual of each candidate 
cause C, and potentially also look for contingencies for C. We separate our dis- 
cussion as follows. We first discuss the calculation of the over-approximation C 
(Sect. 5.1), then we present the ActualCause algorithm that identifies a minimal 
subset of C that is an actual cause (Sect. 5.2), and finally we discuss in detail 
the calculation of contingencies (Sect. 5.3). In the following sections, we use a 
reduction of the universal fragment of HyperLTL to LTL, and the advantages of 
the linear translation of LTL to alternating automata, as we now briefly outline. 
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HyperLTL to LTL. Let p be a V"-HyperLTL formula and I be the counterexam- 
ple. We construct an LTL formula y’ from ọ as follows [31]: atomic propositions 
indexed with different trace variables are treated as different atomic propositions 
and trace quantifiers are eliminated. For example Vz, m.ar A aq results in the 
LTL formula a, Aa. As for I’, we use the same renaming in order to zip all traces 
into a single trace, for which we assume the finite representation t” = u” - (v”)®, 
which is also the structure of the model checker’s output. The trace t” is a vio- 
lation of the formula y’, i.e., t” satisfies ~y’. We denote ¢ := ay’. We can then 
assume, for implementation concerns, that the specification (and its violation) 
is an LTL formula, and the counterexample is a single trace. After our causal 
analysis, the translation back to a cause over hyperproperties is straightforward 
as we maintain all information about the different traces in the counterexample. 
Note that this translation works due to the synchronous semantics of HyperLTL. 


Finite Trace Model Checking Using Alternating Automata. In verifying condi- 
tion CF (that is, in computing counterfactuals and contingencies), we need to 
apply finite trace model checking, as we want to check if the modified trace in 
hand still violates the specification vy, that is, satisfies ø. To this end, we use 
the linear algorithm of [36], that exploits the linear translation of ¢ to an alter- 
nating automaton Ag, and using backwards analysis checks the satisfaction of 
@. An alternating automaton [68] generalizes non-deterministic and universal 
automata, and its transition relation is a Boolean function over the states. The 
run of alternating automaton is then a tree run that captures the conjunctions in 
the formula. We use the algorithm of [36] as a black box (see App. A.2 in [22] for 
a formal definition of alternating automata and App. A.3 in [22] for the transla- 
tion from LTL to alternating automata). For the computation of contingencies 
we use an additional feature of the algorithm of [36] — the algorithm returns 
an accepting run tree 7 of Ag on t”, with annotations of nodes that represent 
atomic subformulas of ¢ that take part in the satisfaction of ø. We use this 
feature also in Sect. 5.1 when calculating the set of candidate causes. 


5.1 Computing the Set of Candidate Causes 


The events that might have been a part of the cause to the violation are in 
fact all events that appear on the counterexample, or, equivalently, all events 
that appear in u” and v”. Note that due to the finite representation, this is 
a finite set of events. Yet, not all events in this set can cause the violation. 
In order to remove events that could not have been a part of the cause, we 
perform an analysis of the transitions of the system taken during the execution 
of t”. With this analysis we detect which events appearing in the trace locally 
cause the respective transitions, and thus might be part of the global cause. 
Events that did not trigger a transition in this specific trace cannot be a part 
of the cause. Note that causing a transition and being an actual cause are two 
different notions - actual causality is defined over the behaviour of the system, 
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not on individual traces. We denote the over-approximation of the cause as C. 
Formally, we represent each transition as a Boolean function over inputs and 
states. Let 6,, denote the formula representing the transition of the system taken 
when reading t”[n], and let Ca,n,¿ be a Boolean variable that corresponds to the 
event (as, n, t”).? Denote yt = Nas, etin] Cami \ Nas, gerin] Cani that is, pt 
expresses the exact set of events in t”[n]. In order to find events that might 
trigger the transition ôn, we check for the unsatisfiable core of Yn = (môn) A Y$. 
Intuitively, the unsatisfiable core of Yn is the set of events that force the system 
to take this specific transition. For every Ca,n,i (Of 4Ca,n,i ) in the unsatisfiable 
core that is also a part of t, we add (a,n,t;) (or (~a, n, t;)) to C. 

We use unsatisfiable cores in order to find input events that are necessary in 
order to take a transition. However, this might not be enough. There are cases 
in which inputs that appear in formula ¢ are not detected using this method, 
as they are not essential in order to take a transition; however, they might be 
considered a part of the actual cause, as negating them can avoid the violation. 
Therefore, as a second step, we apply the algorithm of [36] on the annotated 
automaton Ag in order to find the specific events that affect the satisfaction of 
Ø, and we add these events to CT hen, the unsatisfiable core approach provides 
us with inputs that affect the computation and might cause the violation even 
though they do not appear on the formula itself; while the alternating automaton 
allows us to find inputs that are not essential for the computation, but might 
still be a part of the cause as they appear on the formula. 


Proposition 2. The set C is indeed an over-approximation of the cause for the 
violation. That is, every actual cause C for the violation is a subset of C. 


Proof (sketch). Let e = (la, n, t) be an event such that e is not in the unsatisfiable 
core of w, and does not directly affect the satisfaction of Ø according to the 
alternating automata analysis. That is, the transition corresponding to $ is 
taken regardless of e, and thus all future events on t remain the same regardless 
of the valuation of e. In addition, the valuation of the formula ¢ is the same 
regardless of e, since: (1) e does not directly affect the satisfaction of @; (2) e 
does not affect future events on t (and obviously it does not affect past events). 
Therefore, every set C’ such that e € C’ is not minimal, and does not form a 
cause. Since the above is true for all events e ¢ C, it holds that C C C for every 
actual cause C. 


Proposition 3. The set C satisfies conditions SAT and CF. 


Proof. The condition SAT is satisfied as we add to C only events that indeed 
occur on the counterexample trace. For CF, consider that C is a super-set of 
the actual cause C, so the same contingency and counterfactual of C will also 
apply for C. This is since in order to compute counterfactual we are allowed 
to flip any subset of the events in C, and any such subset is also a subset of C. 


? That is, 7Ca,n,i corresponds to the event (nar , n, t”). Recall that the atomic propo- 
sitions on the zipped trace t” are annotated with the original trace t; from I’. 
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Algorithm 1: ActualCause(y, I, C) 
Input: Hyperproperty y, counterexample I violating y, and a set of candidate 
causes Ĉ for which conditions SAT and CF hold. 
Output: A set of input events C which is an actual cause for the violation. 
for i € [1,...,|C| — 1] do 
for C CC with |C| =i do 
let If = intervene(I,C, Ø); 
if ITI F ọ then 
| return C; 
else 
W = ComputeContingency(y, I’, C); 
if W £ Ú then 
| return C; 


o oo NOAUA WN HR 


10 return C; 


In addition, in computing contingencies, we are allowed to flip any subset of out- 
puts as long as they agree with the counterexample trace, which is independent in 
C and C. 


5.2 Checking Actual Causality 


Due to Proposition2 we know that in order to find an actual cause, we only 
need to consider subsets of C as candidate causes. In addition, since C satisfies 
condition SAT, so do all of its subsets. We thus only need to check conditions 
CF and MIN for subsets of C. Our actual causality computation, presented in 
Algorithm 1 is as follows. We start with the set C, that satisfies SAT and CF. 
We then check if there exists a more minimal cause that satisfies CF. This is 
done by iterating over all subsets C’ of C, ordered by size and starting with the 
smallest ones, and checking if the counterfactual for the C’ manages to avoid the 
violation; and if not, if there exists a contingency for this C’. If the answer to 
one of these questions is yes, then C’ is a minimal cause that satisfies SAT, CF, 
and MIN, and thus we return C’ as our actual cause. We now elaborate on CF 
and MIN. 


CF. As we have mentioned above, checking condition CF is done in two stages — 
checking for counterfactuals and computing contingencies. We first show that we 
do not need to consider all possible counterfactuals, but only one counterfactual 
for each candidate cause. 


Proposition 4. In order to check if a candidate cause C is an actual cause it 
is enough to test the one counterfactual where all the events in C are flipped. 


Proof. Assume that there is a strict subset C of C such that we only need to flip 
the valuations of events in C in order to find a counterfactual or contingency, 
thus C satisfies CF. Since C is a more minimal cause than C, we will find it during 
the minimality check. 
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Algorithm 2: ComputeContingency(y, T, C) 


Input: Hyperproperty p, a counterexample I’ and a potential cause C. 
Output: a set of output events W which is a contingency for y, I and C, or @ if 
no contingency found. 
1 let t” be the zipped trace of I’, y’ be the LTL formula obtained from y, and 
$ =>; 


2 let Ag be the alternating automaton for Ø; 

3 let tf be the counterfactual trace obtained from t” by flipping all events in C; 
4 let N be the sets of events derived from the annotated run tree of Ag on tf; 
5 let O' := {(la,,n,t”) € OE | a € t”[n] > a ¢ #4 [n]}; 

6 for every subset W' C (N NO’), and then for every other subset W' C O' do 
7 t™ := intervene(t”,C, W’); 

8 if t” Fy’ then 

9 | return W’; 
10 return @; 


We assume that CF holds for the input set Č and check if it holds for any 
smaller subset C C C. CF holds for C if (1) flipping all events in C is enough to 
avoid the violation of y or if (2) there exists a non-empty set of contingencies 
for C that ensures that ọ is not violated. The computation of contingencies is 
described in Algorithm 2. Verifying condition CF involves model checking traces 
against an LTL formula, as we check in Algorithm 1 (line 3) if the property ¢ is 
still violated on the counterfactual trace with the empty contingency, and on the 
counterfactual traces resulting from the different contingency sets we consider 
in Algorithm 2 (line 7). In both scenarios, we apply finite trace model checking, 
as described at the beginning of Sect. 5 (as we assume lasso-shaped traces). 


MIN. To check if Ĉ is minimal, we need to check if there exists a subset of C 
that satisfies CF. We check CF for all subsets, starting with the smallest one, 
and report the first subset that satisfies CF as our actual cause. (Note that we 
already established that Č and all of its subsets satisfy SAT.) 


5.3 Computing Contingencies 


Recall that the role of contingencies is to eliminate the effect of other possible 
causes from the counterfactual world, in case these causes did not affect the 
violation in the actual world. More formally, in computing contingencies we look 
for a set W of output events such that changing these outputs from their value in 
the counterfactual to their value in the counterexample t” results in avoiding the 
violation. Note that the inputs remain as they are in the counterfactual. We note 
that the problem of finding contingencies is hard, and in general is equivalent 
to the problem of model checking. This is since we need to consider all traces 
that are the result of changing some subset of events (output + time step) from 
the counterfactual back to the counterexample, and to check if there exists a 
trace in this set that avoids the violation. Unfortunately, we are unable to avoid 
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an exponential complexity in the size of the original system, in the worst case. 
However, our experiments show that in practice, most cases do not require the 
use of contingencies. 

Our algorithm for computing contingencies (Algorithm 2) works as follows. 
Let tf be the counterfactual trace. As a first step, we use the annotated run tree 
T of the alternating automaton Ag on tf to detect output events that appear 
in Ø and take part in satisfying Ø. Subsets of these output events are our first 
candidates for contingencies as they are directly related to the violation (Algo- 
rithm 2 lines 4—9). If we were not able to find a contingency, we continue to check 
all possible subsets of output events that differ from the original counterexample 
trace. We test the different outputs by feeding the counterfactual automaton of 
Definition 4 with additional inputs from the set [°. The resulted trace is then 
our candidate contingency, which we try to verify against y. The number of dif- 
ferent input sequences is bounded by the size of the product of the counterfactual 
automaton and the automaton for ¢, and thus the process terminates. 


Theorem 1 (Correctness). Our algorithm is sound and complete. That is, let 
I be a counterexample with a finite representation to a V"-HyperLTL formula 
w. Then, our algorithm returns an actual cause for the violation, if such exists. 


Proof. Soundness. Since we verify each candidate set of inputs according to 
the conditions SAT, CF and MIN, it holds that every output of our algorithm 
is indeed an actual cause. Completeness. If there exists a cause, then due to 
Proposition 2, it is a subset of the finite set C. Since in the worst case we test 
every subset of € , if there exists a cause we will eventually find it. 


6 Implementation and Experiments 


We implemented Algorithm1 and evaluated it on publicly available example 
instances of HyperVis [48], for which their state graphs were available. In the 
following, we provide implementation details, report on the running time and 
show the usefulness of the implementation by comparing to the highlighting out- 
put of HyperVis. Our implementation is written in Python and uses py-aiger [69] 
and Spot [27]. We compute the candidate cause according to Sect.5.1 with py- 
sat [50], using Glucose 4 [3,66], building on Minisat [66]. We ran experiments on 
a MacBook Pro with a 3,3 GHz Dual-Core Intel Core i7 processor and 16 GB 
RAM. 


Experimental Results. The results of our experimental evaluation can be found in 
Table 1. We report on the size of the analyzed counterexample |I|, the size of the 
violated formula |p|, how long it took to compute the first, over-approximated 
cause (see time(C)) and state the approximation C itself, the number of computed 
minimal causes #(C) and the time it took to compute all of them (see time(VC)). 
The Running Example is described in Sect.3, the instance Security in & out 


3 Our prototype implementation and the experimental data are both available at: 
https: //github.com/reactive-systems/explaining-hyperproperty-violations. 
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Table 1. Experimental results of our implementation. Times are given in ms. 


Instance IT] |p| time(C) C #(C) time(VC) 
Running example 10 9 19 Shi? o hiha 2 55 
Security in & out 35 19 292 hig hip o ohif o Shia 8 798 
higy hip, s hifa hifo 
Drone example 1 24 19 33 bound? , bound}, , up, .rupz, 5 367 
bound?., , abound? , sup, 

Drone example 2 18 36 31 bound}, , abound}, , upto 3 256 
Asymmetric arbiter ’19 28 35 53 see App. A.4 in [22] 10 490 

Asymmetric arbiter 72 35 70 see App. A.4 in [22] 24 1480 


refers to a system which leaks high security input by not satisfying a noninter- 
ference property, the Drone examples consider a leader-follower drone scenario, 
and the Asymmetric Arbiter instances refer to arbiter implementations that do 
not satisfy a symmetry constraint. Specifications can be found in the full version 
of this paper [22]. 

Our first observation is that the cause candidate C can be efficiently com- 
puted thanks to the iterative computation of unsatisfiable cores (Sect. 5.1). The 
cause candidate provides a tight over-approximation of possible minimal causes. 
As expected, the runtime for finding minimal causes increases for larger coun- 
terexamples. However, as our experiments show, the overhead is manageable, 
because we optimize the search for all minimal causes by only considering every 
subset in C instead of naively going over every combination of input events (see 
Proposition 2). Compared to the computationally heavy task of model check- 
ing to get a counterexample, our approach incurs little additional cost, which 
matches our theoretical results (see Proposition 1). During our experiments, we 
have found that computing the candidate C first has, additionally to providing 
a powerful heuristic, another benefit: Even when the computation of minimal 
causes becomes increasingly expensive, C can serve as an intermediate result for 
the user. By filtering for important inputs, such as high security inputs, C already 
gives great insight to why the property was violated. In the asymmetric arbiter 
instance, for example, the input events (~tb_secret, 3, to) and (tb_secret,3,t,) of 
C, which cause the violation, immediately catch the eye (c.f App. A.4 in [22]). 


Comparison to HyperVis. HyperVis [48] is a tool for visualizing counterexam- 
ples returned from the HyperLTL model checker MCHyper [35]. It highlights the 
events in the trace that it considers responsible for the violation based on the 
formula and the set of traces, without considering the system model. However, 
violations of many relevant security policies such as observational determinism 
are not caused by events whose atomic propositions appear in the formula, as can 
be seen in our running example (see Sect.3 and Example 2). When running the 
highlight function of HyperVis for the counterexample traces t1,t2 on Running 
example, the output events (lo, 1,t,) and (-lo, 1, t2) will be highlighted, neglect- 
ing the decisive high security input hi. Using our method additionally reveals 
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the input events (7hi,0,t1) and (hi,0,t2), i.e., an actual cause (see Table 1). 
This pattern can be observed throughout all considered instances in our experi- 
ments. For instance in the Asymmetric arbiter instance mentioned above, the 
input events causing the violation also do not occur in the formula (see App. A.5 
n [22]) and thus HyperVis does not highlight this important cause for the vio- 
lation. 


7 Related Work 


With the introduction of HyperLTL and HyperCTL* [20], temporal hyper- 
properties have been studied extensively: satisfiability [29,38,60], model check- 
ing [34,35,49], program repair [11], monitoring [2,10,32,67], synthesis [80], 
and expressiveness studies [23,37,53]. Causal analysis of hyperproperties has 
been studied theoretically based on counterfactual builders [40] instead of 
actual causality, as in our work. Explanation methods [4] exist for trace prop- 
erties [5,39,41,42,70], integrated in several model checkers [14,15,19]. Min- 
imization [54] has been studied, as well as analyzing several system traces 
together [9,43,65]. There exists work in explaining counterexamples for function 
block diagrams [51,63]. MODCHK uses a causality analysis [7] returning an over- 
approximation, while we provide minimal causes. Lastly, there are approaches 
which define actual causes for the violation of a trace property using Event Order 
Logic [13, 56,57]. 


8 Conclusion 


We present an explanation method for counterexamples to hyperproperties 
described by HyperLTL formulas. We lift Halpern and Pearl’s definition of actual 
causality to effects described by hyperproperties and counterexamples given as 
sets of traces. Like the definition that inspired us, we allow modifications of the 
system dynamics in the counterfactual world through contingencies, and define 
these possible counterfactual behaviors in an automata-theoretic approach. The 
evaluation of our prototype implementation shows that our method is prac- 
tically applicable and significantly improves the state-of-the-art in explaining 
counterexamples returned by a HyperLTL model checker. 
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Abstract. The most widely used Zero-Knowledge (ZK) protocols 
require provers to prove they know a solution to a computational problem 
expressed as a Rank-1 Constraint System (R1CS). An R1CS is essentially 
a system of non-linear arithmetic constraints over a set of signals, whose 
security level depends on its non-linear part only, as the linear (additive) 
constraints can be easily solved by an attacker. Distilling the essential 
constraints from an R1CS by removing the part that does not contribute 
to its security is important, not only to reduce costs (time and space) of 
producing the ZK proofs, but also to reveal to cryptographic program- 
mers the real hardness of their proofs. In this paper, we formulate the 
problem of distilling constraints from an R1CS as the (hard) problem of 
simplifying constraints in the realm of non-linearity. To the best of our 
knowledge, it is the first time that constraint-based techniques developed 
in the context of formal methods are applied to the challenging problem 
of analysing and optimizing ZK protocols. 


1 Introduction 


Zero-Knowledge (ZK) protocols [8,15, 17,27] enable one party, called prover, to 
convince another one, called verifier, that a statement is true without reveal- 
ing any information beyond the veracity of the “statement”. In this context, we 
understand a statement as a relation between an instance, a public input known 
to both prover and verifier, and a witness, a private input known only to the 
prover, which belongs to a language £ in the nondeterministic polynomial time 
(NP) complexity class [5,15]. The most popular, efficient and general-purpose ZK 
protocols are ZK-SNARKs: ZK Succinct Non-interactive ARguments of Knowl- 
edge. While a proof guarantees the existence of a witness in a language £, and 
argument of knowledge proves that, with very high probability, the prover knows 
a concrete valid witness in L. A ZK-SNARK does not require interaction between 
the prover and the verifier, and regardless of the size of the statement being 
proved, the size of the proof is succinct. These appealing properties of ZK- 
SNARKs have made them become crucial tools in many real-world applications 
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with strong privacy issues. A prominent such example is Zcash [4]. ZK proto- 
cols are also being used in conjunction with smart contracts, in the so-called 
ZK-rollups for enhancing the scalability of distributed ledgers [18]. 

Like most ZK systems, ZK-SNARKs operate in the model of arithmetic cir- 
cuits, meaning that the NP language £ is that of satisfiable arithmetic circuits. 
The gates of an arithmetic circuit consist of additions and multiplications mod- 
ulo p, where p is typically a large prime number of approximately 254 bits [3]. 
The wires of an arithmetic circuit are called signals, and can carry any value 
from the prime finite field F,. In the ZK context, there is usually a set of public 
inputs known both to the prover and the verifier, and the prover proves that she 
knows a valid assignment to the rest of signals that satisfies the circuit (i-e., the 
witness). Most ZK-SNARK protocols draw from a classical algebraic form for 
encoding circuits and wire assignment called rank-1 constraint system (R1CS). 
An RICS encodes a circuit as a set of quadratic constraints over its variables, 
so that a correct execution of a circuit is equivalent to finding a satisfying vari- 
able assignment. This way, a valid witness for an arithmetic circuit translates 
naturally into a solution of its R1CS representation. 

Although ZK protocols guarantee that a malicious verifier cannot extract a 
witness from a proof, they do not prevent the verifier from attacking the state- 
ment directly. Hence, it is important that the prover is aware of the difficulty of the 
statement being proved. In this regard, it is challenging for cryptographic develop- 
ers that apply ZK protocols to complex computations to assess the real hardness 
of the produced computational problem, being hence also difficult to verify and 
audit the systems. It is partly because a syntactic assessment (e.g. based on count- 
ing the number of non-linear constraints) can be inaccurate and misleading. This 
is the case if the R1CS contains redundant constraints, i.e., constraints that can be 
deduced from others or constraints that follow from linear constraints, since they 
do not contribute to the hardness of the computational statement. Distilling the 
relevant constraints is important on one hand for efficiency, to reduce costs (time 
and space) of producing the ZK proofs, and also because redundancy can mislead 
developers to believe that the statement is far more complex than it really is. It 
is clear that when arithmetic circuits are defined over a finite field of small order, 
the problem can be attacked by brute-force, or if the system consists only of linear 
constraints, a solution can be found in polynomial time [25]. Moreover, in R1CS- 
based systems like [17] only multiplication gates add complexity to the statement. 
Also note that linear constraints induce a way to compute the value of one signal 
from a linear combination of the others, and hence we can easily extend a witness 
for the other signals to a witness for all the signals. As a result, the difficulty of 
finding a solution to a system relies mostly in the number of non-redundant non- 
linear constraints. 


Contributions. This case study paper applies techniques developed in the con- 
text of formal methods to distill constraints from the R1CS systems used by 
ZK protocols. The main challenges are related, on the one hand, to reasoning 
with non-linear information in a finite field and, on the other hand, to dealing 
with very large constraint systems. Briefly, our main contributions are: (1) we 
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present a formal framework to reason on circuit reduction which generalizes the 
application of different existing optimizations and the reduction strategy in which 
they are applied, (2) we introduce a concrete new optimization technique based on 
Gaussian elimination that allows deducing linear constraints from the non-linear 
constraints, (3) we implement our approach within circom [21] (a novel domain- 
specific language and compiler for defining arithmetic circuits) and also develop an 
interface for using it on the R1CS generated by ZoKrates [12], (4) we experimen- 
tally evaluate its performance on multiple real-world circuits (including templates 
from the circon library [22] and from [12], on implementations of different SHA-2 
hash functions, on elliptic curve operations, etc.). 


2 Preliminaries 


This section introduces some preliminary notions and notation. We consider Fp 
a finite field of prime order p. As usual, F, is a sequence of n values in Fp. 
We drop p from F when it is irrelevant. An arithmetic circuit (over the field F) 
consists of wires (represented by means of signals s; € F) connected to gates 
(represented by quadratic constraints). Signals can be public or private. We now 
define the concepts of quadratic constraints and R1CS over a set of signals. 


Definition 1 (R1CS). A quadratic constraint over a set of signals {51,..., Sn} 
is an equation of the form Q: Ax B—C=0, where A, B,C € F[s1,..., Sn] are 
linear polynomials over the variables s1,..., Sn, ie., A= ao + a181 ++: + anSn, 
B = bo + bısı +: +bnSn, and C = co + c1s1 +: + CnSn, where ai, bi, ci € F 
for alli € {0,...,n}. A rank-1 constraint system (R1CS) over a set of signals 
T is a collection of quadratic constraints over T. 


We say that a quadratic constraint Q is linear when A or B only have the 
constant term, i.e., a; = 0 Vi € {1,...,n} or bi =0 Vi € {1,...,n}, and is non- 
linear otherwise. As R1CS systems only contain quadratic constraints, in what 
follows, we simply call them constraints, and specify if they are linear or not 
where needed. We use the standard notation S } c to indicate that a constraint 
c is deducible from a set of constraints S and |S| for the number of constraints. 


Definition 2 (arithmetic circuit and witness). An (arithmetic) circuit is 
a tuple C = (U,V,S') where U represents the set of public signals, V represents 
the set of private signals, and the RICS S={Q1,...,Qm} over the signals UUV 
represents the circuit operations. Given an assignment u for U, a witness for C 
is an assignment v for V s.t. u together with v are a solution to the RICS S. 


We use the terms circuit and, R1CS or just constraint system, indistinctly when 
the signals used in the circuit are clear. Given a circuit C and a public assignment 
for U, a ZK protocol is a mechanism that allows a prover to prove to a verifier 
that she knows a private assignment for V that, together with those for U, satisfy 
the R1CS system describing C. ZK protocols guarantee that the proof will not 
reveal any information about V. 
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Example 1. We consider a circuit Cı = (U,V, Sı) over a finite field F, with 
U = {v,w}, V = {x,y,z}, and Sı given by the following constraint system: 


Qi:wx(yt+z)—4%-10=0, Qo:wxz—-—w-—3=0, 
Q3:(~@—-w+l1)xv—v4+1=0, Qa: y—2z-2=0. 


This circuit contains 3 non-linear constraints (Q1, Q2, and Q3) and a linear one 

(Q4). Because of its small size, we can easily solve the system (i.e., give the 

value of each signal in terms of only one of them) and find the set of solutions: 
W ={(v,w,2,y,2) = (1,w,w — 1,3w7! + 3,3w-1 +1) | we F \ {O}}. 


A cryptographic problem can be modeled by different circuits producing the 
same solutions. This relation among circuits can be formalized as circuit equiv- 
alence, which is a natural extension of the constraint system equivalence. We 
say that two circuits C = (U,V,S) and C’ = (U,V, S’) are equivalent, written 
C~C’, if S and S’ have the same set of solutions. Consequently, if C and C’ are 
equivalent, they have the same set of solutions and hence of witnesses. 


Example 2. The circuit Cz = (U,V, S2) with the same sets of public and private 
signals U and V as C1, and the RICS Sə given by the constraints: 


Qi: wxy-3w-3=0, Qh: y—-2z-2=0, Q3:v-1=0, Qy:r-—w+1=0, 


has the same set of solutions (and thus witnesses) as Cı. Hence, Ci ~ C2. 


3 A Formal Framework for R1CS Reduction 


RICS optimizations are applied within state-of-the-art compilers like circom 
[21] or ZoKrates [12]. Common to such existing compiler optimizations is the 
application of rules to simplify and eliminate linear constraints and/or to deduce 
information from them. As our first contribution, we present a formal framework 
for R1CS reduction based on a rule-based transformation system which is general 
enough to be a formal basis for developing specific simplification techniques 
and reduction strategies. In particular, the simplifications already applied in the 
above compilers are instantiations of our framework. 

The notion of reduction that our framework formalizes is key to define the secu- 
rity level of circuits. When two circuits model the same problem, they provide the 
same level of security. However, an assessment of their security level based on syn- 
tactically counting the number of non-linear constraints in the circuits can lead to 
a wrong understanding/estimation of their security. For instance, circuits Cı and 
Cz (see Examples 1-2) model the same problem, although C2 needs a single non- 
linear constraint to define its set of solutions (instead of three as C1). This happens 
because some of the non-linear constraints of C4 are not essential and can be sub- 
stituted by linear constraints. Besides, we can observe in Cz that signals x and z 
are only involved in linear constraints instead of being on non-linear constraints 
like in C1. In other words, having a circuit with more private signals involved in 
non-linear constraints (e.g., Cı) does not ensure further security if these private 


434 E. Albert et al. 


signals can be deduced from linear combinations of the others. We build our notion 
of circuit reduction upon this concept. 


Definition 3 (circuit-reduction). Let C = (U,V,S) be a circuit with 
UUV = {s1,..., Sn}, and C’ = (U, V', S') another circuit with V CV’. 


(i) We say that C’ linearly follows from C, denoted by C =; C’, if Ys € V'\ V, 

J36, AT, An E F, s.t. given an assignment for U, every witness ọ for C 
extended with s= A + Yc, Af * O(s;) is a witness for C’. 

(ii) We say that C’ reduces to C, written C' > C, if C Hı C’, |S’| > |S| and every 
witness of C’ restricted to V is a witness for C for the same assignment of U. 
We say that C’ strictly reduces to C, written C’ >C if |S] >|S| or V CV’. 


Intuitively, we have that for every signal defined in V, the values of the two 
witnesses match, and for the signals defined in V’ \V, the value of the witness of 
C’ can be obtained from a linear combination of the values from the assignment 
for U and @¢. 


Example 3. Let C3 be ({v, w}, {y}, S3) with S3 = {Q/ : w x y — 3w — 3 = 0, 
Q3 : v — 1 = 0}. Let us show that Cı (from Example 1) strictly reduces to Cs. 
From Example 2, we have that every solution of Cı restricted to {v, w, y} is also 
a solution of C3 (since S3 C S2 and Cp ~ Cı) and that in every witness ọ' of 
C2 we have that g'(x) = ¢/(w) — 1 and ¢'(z) = @'(y) — 2. Therefore, taking 
N= —-1, Arata = 1, à = -2, Atay = 1 (where function pos(s;) abstracts 
the index i of the variable s; in the set of signals), we have that C3 =; C1. Finally, 
since {y} C {x,y,z} and, given an assignment for {v, w}, every witness of Cı 
restricted to {y} is a witness for C3, and we can conclude. 


We now present a set of transformation rules that ensure circuit reducibility. 
The transformation is based on finding linear consequences of the constraint 
system to guarantee that the transformed set of constraints linearly follows from 
the original system. Our transformation rules operate on pairs in K xS, where K 
is the set of arithmetic circuits and Sy is the set of linear constraint systems. As 
usual, we use infix notation, writing (C, Sz) > (C’, Sz’), and denote respectively 
by +7 and >*, its transitive and reflexive-transitive closure. Given a circuit C, 
if (C,0) =* (C’, Sz), then C’ is a reduction for C, and the linear system Sz shows 
how to prove that C’ H; C. In the following, we assume that there exists a total 
order < among the private signals in V which is used to select a signal among 
the private signals of a constraint c, denoted by V(c). 


(REMOVE) ((U,V,SU {c}), SL) > ((U, V, S), Sz), if Sc. 
(DEDUCE) ((U,V,5S),Sz7) => ((U,V,S), Sz U{c}), ifc linear, S = c, c Z Sr. 


(SIMPLIFY) ((U, V, S), Sz U {s =1}) > ((U,V\{s}, S[s > J), Sz[s > l] U {s = ]}), 
if s € V and Vz € V(l), z < s. 


Fig. 1. Circuit transformation rules. 
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The REMOVE rule allows us to remove redundant constraints. The DEDUCE 
rule is needed to extract from S linear relations among the signals. Finally, the 
SIMPLIFY rule allows us to safely remove a signal s from V by replacing it by an 
equivalent linear combination of public and (strictly) smaller private signals in 
S. The fact that we replace a private signal by strictly smaller ones prevents this 
rule from being applied infinitely many times. When no constraint or private 
signal can be removed from a circuit (e.g., from C3) after applying a sequence 
of reduction rule steps, the circuit is considered irreducible and we call it a 
normal form. Note that the linear constraints in Sz with signals not belonging 
to U UV are the ones that track how to obtain the missing signals from the 
remaining ones. 

The three rules from Fig. 1 are terminating and they are contained in the 
circuit reducibility relation (Definition 3) when projected to the first component 
(the circuit). Regarding confluence, we have that if (C, Sz) =* (Ci, SL1) and 
(C, SL) >* (C2, Sra), then we have that (C1, S11) >* (Ci, S11) and (Co, S2) >* 
(CS, S2) such that Ci and C4 are equivalent (see Appendix). 


Example 4. Let us apply our reduction system to find a normal form of (C1, Ø) 
which corresponds to its reduction. At each step we label the arrow with the 
applied rule and show only the component that is modified from the previous 
step (we use _ to indicate the value of the component as in the previous step): 


(U,V, $1), 0) BE (a {i 2 = y — E -\ 2h, ey - 2), 
"SE (G, = \ {0 = 0})},-) Poe (44 -) -U {L2 : £ = w — 1}) 
D -rh [2 w- I), 
TA Ea E 


Here (Cs, {£1, L2}) is a normal form of (C1, Ø) and, as we have already seen in 
Example 3, C3 is a reduction for Cı. Note that {L1, L2} shows how to obtain the 
values of the removed signals as a linear combination. 


4 Circuit Reduction Using Constraint Simplification 


In this section, we introduce different strategies to apply the transformation rules 
described in Fig. 1, and also to approximate the deduction relation S' — c in rules 
REMOVE and DEDUCE. Note that the classical representation of our problem is 
undecidable, but since we work in a finite field, it becomes decidable. However, 
as the order of F is large, it is still impractical and approximation is required. 
As an example, let us show how the simplification techniques applied in 
ZoKrates and circom fit in our framework. In both languages, besides the 
removal of tautologies, all simplification steps are made using linear constraints 
that are part of the set of constraints. In particular, in a first step both lan- 
guages handle the so-called redefinitions (i.e., constraints of the form z = y), 
and in a second step all the remaining linear constraints are eliminated applying 
the necessary substitutions. In our framework, these simplification steps can be 
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described as a sequence of DEDUCE to obtain the linear constraints that will be 
applied as substitutions, followed by a sequence of SIMPLIFY, and a sequence of 
REMOVE to delete the tautologies obtained after the substitutions. The whole 
sequence can be repeated until no linear constraints are left in the circuit. The 
specific strategy followed to perform the sequence of DEDUCE steps to obtain 
the substitutions used to simplify the circuit from its linear constraints has a 
big impact in the efficiency of the process. For instance, circom considers all 
maximal clusters of linear constraints (sharing signals) in the system and then 
infers in one go all the substitutions to be applied for every cluster, using a lazy 
version of Gauss-Jordan elimination. This process can be very expensive when 
the number of constraints in the R1CS is very large (e.g. hundreds of millions in 
ZK-Rollups like Hermez [20]). 

Similar techniques based on analyzing the linear constraints are applied in 
other circuit-design languages. However, up to our knowledge, no language uses 
the non-linear part of the circuit to infer new linear constraints, or to remove 
redundant constraints, and this constitutes the second main contribution of this 
work. In the remaining of this section, we present a new approach inspired by 
techniques used in program analysis and SMT-solving like [9,11], where the 
non-linear reasoning is reduced to linear-reasoning. We can assume that we have 
applied first the aforementioned strategies to obtain an R1CS containing only 
non-linear constraints (or linear constrains with only public signals). Then, in our 
framework, the problem of inferring new linear constraints from non-linear R1CS 
can be formalized as a synthesis problem as follows: “given a circuit (U,V, S), 
where UUV = {s1,..., Sn}, our goal is to find a linear expression | = co + 
C18, +... + CnSn with Co, C1,...,Cn E F such that S = l= 0.” In order to solve 
this problem, we follow an efficient approach in which we restrict ourselves to 
the case where l = 0 can be expressed as a linear combination of constraints 
in S, i.e., of the form >> Ax * Qk with Qk E€ S and r, €E F. It is clear that any 
constraint l = 0 obtained using this approach satisfies S = | = 0, but we are only 
interested in the ones that are linear. In the following two stages, we describe 
how to obtain linear expressions l, and hence, infer the constraints. 

Stage 1. First, for each constraint Qk : A, x Bk -—C, = 0, k € {1,...,m}, we 
expand the multiplication A; x B,, obtaining the expression X4 <;< j<n Qk [i,j] * 
8,8; +L, where Q,[i, j] for 1 <i < j < n denotes the coefficient of the monomial 
8,8; in the constraint Q,, and L, is the linear part of A, x Bx. 


Example 5. Let us consider the circuit from Example 4 after applying the first 
three transformation rules, i.e. after removing the linear constraints. We denote 
the resulting circuit C4 = (U, V4, S4), where U U V4 = {v,w, x,y} and S4 is 
given by: 

Qı : w x (2y — 2) — 4x — 10 = 0, Q2 : w x (y—2) -w-3=0, 

Q3: (x—-w+1)xv-v+1=0. 
Here, we have for Qı that A, = w, Bı = 2y — 2 and Cı = 4a + 10 (recall that 
we consider A, x Bı — C1 = 0). Then, we expand the multiplication A; x B, = 
2wy — 2w, so that Lı = —2w and Q,[2,4] = 2 (for wy), where the later is 
the only non-zero coefficient of a quadratic monomial. Similarly, for Q2 we have 
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C2 = w + 3, Q2[2, 4] = 1 (also for wy) and Lz = —2w. Finally, for Q3 we have 
C3 =v—1, and Qs3[1, 3] = 1 (for vx) and Q3[1, 2] = —1 (for vw) and La = v. 


Stage 2. Now, we can model a sufficient condition of linearity using the 
previous ingredients: if there exist 1,...,Am E€ F such that, for every i,j with 
1<i<j <n, we have that 77), An*Qzli, j] = 0, then l = Oy, An* (Le — Ck) 
is linear and S — l = 0. Moreover, assuming that S' is consistent, we have that 
either l = 0 is a tautology 0 = 0 or it is a non-trivial linear constraint. In the 
first case, any of the constraints Q; with A, Æ 0 follows from the rest of the 
constraints and we can apply the REMOVE rule. In the second case, we can apply 
DEDUCE and later SIMPLIFY if l has at least one private signal. Note that, after 
applying SIMPLIFY one of the constraints Qk with A, Æ 0 will follow from the 
rest, and we will be able to finally apply REMOVE. 


Example 6 (continued). Following the example, we need to find 1, A2, A3 such 
that (considering only the non-zero coefficients Q[#, j]) 2A1 +A2 = 0 (for Q[2, 4]), 
2A3 = 0 (for Q[1,3]), and —A3 = 0 (for Q[1, 2]). Since the monomials vx and vw 
occur only once, the only solution for A3 is 0. Now solving 2A; + Ag = 0, we get 
that Ag = —2A,. Hence, we take the solution A; = 1 and Ag = —2. With this 
solution, | = 1 x (—2w — (4x + 10)) + (—2) x (—2w — (w + 3)) + 0 x (v — (v — 1)). 
Hence, we obtain 4w — 4x — 4 = 0, which is equivalent to x — w + 1 = 0 that is 
the deduced linear constraint used in Example 4 to reduce the original circuit. 


To conclude, finding à1,..., Àm €E F such that for every i,j with 1 <i< j< 
n, then Xz] àk * Qrli, j] = 0, is a linear problem that can be solved using 
Gaussian elimination or similar techniques. Note that we are only interested in 
solutions with at least one A, # 0. Therefore, we can efficiently synthesize new 
linear constraints or show that some constraint follows from the others using 
this approach. 

Regarding the practical application of our technique, since sometimes we 
are handling very large sets of non-linear constraints, additional engineering 
is needed to make it work. For instance, we need to remove those constraints 
that have a quadratic monomial that appears in no other constraint, and after 
that, compute maximal clusters sharing the same quadratic monomials. We have 
observed in our experimental evaluation that, in general, even for large circuits, 
each cluster remains small. Thanks to this, we obtain rather small independent 
sets of constraints that can be solved in parallel using Gaussian elimination. 


5 Experimental Results 


This section describes our experimental evaluation on two settings: On one hand 
(Sect. 5.1), we have implemented them within circom [21], a novel domain- 
specific language and compiler for defining arithmetic circuits, fully written in 
Rust. The circom compiler generates executable code (WebAssembly or C++) to 
compute the witness, together with the R1CS, since both are later needed by ZK 
tools to produce ZK proofs. The implementation is available in a public fork of 
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the compiler [1]; On the other hand (Sect. 5.2), we have decoupled the constraint 
optimization module from the circom compiler in a new project, which is acces- 
sible online [2], in order to be able to use it after other cryptographic-language 
compilers that produce R1CS, in our case with ZoKrates [12]. ZoKrates is a 
high-level language that allows the programmer to abstract the technicalities 
of building arithmetic circuits. The input to our optimizer is the R1CS in the 
smtlib2 format generated by ZoKrates. The goal of our experiments is two fold: 
(1) assess the scalability of the approach when applied to real-world circuits 
used in industry and (2) evaluate its impact on code already highly optimized 
-such as circom’s libraries developed on a low-level language by experienced 
programmers- and on code automatically compiled from a high-level language 
such as ZoKrates. In both cases, the optimizations of linear constraints that 
the compilers include (see Sect. 4) are enabled so that the reduction gains are 
due only to our optimization. Experimental results have been obtained using an 
AMD Ryzen Threadripper PRO 3995WX 64-Cores Processor with 512 GB of 
RAM (Linux Kernel Debian 5.10.70-1). 


5.1 Results on circom Circomlib 


circom is a modular language that allows the definition of parameterizable small 
circuits called “templates” and has its own library called circomlib [22]. This 
library is widely used for cryptographic purposes and contains hundreds of tem- 
plates such as comparators, hash functions, digital signatures, binary and deci- 
mal converters, and many more. Our experiments have been performed on the 
available test cases from circomlib. Many of them have been carefully pro- 
grammed by experienced cryptographers to avoid unnecessary non-linear con- 
straints and our optimization cannot deduce new linear constraints. Still, we are 
able to reduce 26% of the total tests (12 out of 46). 

Table 1 shows the results for the five circuits that we optimize the most. For 
each of them, we show: (+C) the number of generated constraints, (#R) the 
number of removed constraints, (G%) the gains expressed as ##R/#C x 100, 
and (T(s)) the compilation time. The largest gain is for pointbits_loopback, 
where circom generates 2.333 constraints and we remove 381 of them, our gain 
is 16.33% and the compilation 


time is 13.4s. As explained in Table 1. Results on circomlib. 


Sect. 4, for each linear con- Circuit #C |#R| G% |T(s) 
straint deduced by our tech- |2)2956_2_test 30134] 32 | 0.11% |15.6s 
mque, “We ars always able feddsamimc test 5712 | 46 | 0.81% | 1.9s 
to Soon & non-linear con-  Jeddsaposeidon_test| 4217 | 46 | 1.09% | 1.7s 
strata gad, i Boner B00. | cadaa peat 7554 [556 | 7.36% | 4.85 
a signal. Note that we some- | o intbits loopback] 2333 | 381 /16.33%|13.4s 


times produce new linear con- 
straints in which all the involved signals are public and thus, none of them can 
be removed. Importantly, in spite of the manual simplifications already made 
in most of the circuits in circomlib, our techniques detect further redundant 
constraints in a short time. Such small reductions in templates of circomlib 


Distilling Constraints in Zero-Knowledge Protocols 439 


can produce larger gains, since they are repeatedly used as subcomponents in 
industrial circuits. 
5.2 Results on ZoKrates Stdlib 


We have used two kind of 
circuits from the ZoKrates 


Table 2. Results on stdlib. 


stdlib for our experimental Circuit #C | #R| G% |T(s) 
evaluation: (1) The first four |sha256bit 25730] 288 | 1.1% |35.0s 
circuits shaXbit are implemen- |sha512bit 26838] 544 | 2.0% |37.8s 
tations of different SHA-2 hash |sha1024bit 54284|1312| 2.4% |82.4s 
functions [19], where X indi- |sha1536bit 81730)/2080| 2.6% | 128s 
cates the size of the output. |Poseidon 3912 | 851 |21.8%] 0.3s 
SHA-2 hashes are constructed EdwardsAdd 17 4 123.6%|0.07s 
from the repeated use of simple EdwardsOrderCheck) 56 | 15 |26.8%|0.07s 
computation units that heavily EqyardsScalarMult| 9989 |2304|23.1%| 0.28 
use bit operations. Bit opera- ProofOfOwnership | 9984 |2306|23.0%| 0.5s 


tions are very inefficient inside 
arithmetic circuits [13] and, as a result, the number of constraints describing 
these circuits is very large, see in Table 2. The number of constraints deduced is 
quite low for this kind of circuits since specialized optimization for bitwise oper- 
ation is required (other compilers like xJsnark [23] are specialized on this). This 
also happens in the circom implementation of SHA-256-2 (row 1 of Table 1). 
However, Poseidon [16] is a recent hash function that was designed taking into 
account the nature of arithmetic circuits in a prime field F, and as a result, 
the function can be described with many less constraints. Our approach is able 
to optimize the current implementation of Poseidon by more than 20%, which 
represents a very significant reduction. (2) The second kind are the last four 
circuits: they correspond to the ground for implementing elliptic curve cryp- 
tography inside circuits. Our optimizer detects, in a negligible time, that more 
than 23% of constraints are redundant and can be removed. Verifying if a pair 
of public/private keys matches (ProofO0fOwnership) is fundamental in almost 
every security situation, hence the optimization of this circuit becomes particu- 
larly relevant for saving blockchain space. For this reason, we have parameterized 
ProofOfOwnership to the number of pairs public/private keys to be verified and 
we have measured the performance impact (time and memory consumption) of 
snarkjs setup step of these circuits without simplification (Table 3) and after 
simplification (Table 4). The results show the effect of our reduction when the 
constraints are later used by snarkjs to produce ZK proofs. 
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Table 3. Results on different instantiations of ProofO0fOwnership from stdlib without 
nonlinear simplification. The generated ERROR in last row is an out-of-memory-error. 


Circuit Generation snarkjs 

T(s) #C Size T(s) | Memory 
ProofO0fOwnership-400 | 1m58.1s| 3,902,378 582MB | 7m26.8s | 14.4GB 

Proof0fOwnership-1000 | 4m54.7s | 9,740,978 | 1.5GB | 37m50.0s | 33.1GB 

ProofO0fOwnership-1200 | 6m09.6s | 11,687,178 1.7GB | 47m15.7s | 36.2GB 

Proof0fOwnership-1400 | 6m50.1s | 13,633,378 | 2.0GB | ERROR ERROR 


Table 4. Results on different instantiations of ProofO0fOwnership from stdlib with 
nonlinear simplification. 


Circuit Generation snarkjs 

T(s) #C Size T(s) | Memory 
Proof0fOwnership-400 | 3m11.0s | 2,970,072 |451MB} 5m00.1s | 12.7GB 

ProofOfOwnership-1000| 8m05.1s | 7,413,672 | 1.1GB | 23m40.8s| 24.6GB 

Proof0fOwnership-1200| 9m43.8s | 8,894,872 | 1.4GB | 31m46.8s| 30.7GB 

Proof0fOwnership-1400 | 11m06.4s | 10,376,072 | 1.6GB | 38m31.0s| 32.7GB 


The impact of our simplification on the setup step of snarkjs is relevant and 
goes beyond the increase in the compilation time. However, this step is applied 
only once. We have also measured the impact in performance when generating a 
Zk-proof for a given witness using snarkjs after the setup step. This action that 
is the one repeated many times when used in a real context. Our experiments 
show that, e.g., with ProofOfOwnership-400 we improve from 41s to 35s and 
with ProofO0fOwnership-1000 we improve from 1m 53s to 1m 12s. 

In conclusion, our experiments show that the higher the level of abstraction 
is, the more redundant constraints the compiler introduces in the R1CS. Our 
proposed techniques are an efficient and effective solution to enhance the perfor- 
mance in this setting. On the other hand, circuits written in a low-level language 
by security experts (usually optimized by hand), or circuits using bitwise oper- 
ations, leave small room for optimization by applying our techniques. 


6 Related Work and Conclusions 


We have proposed the application of (non-linear) constraint reasoning techniques 
to the new application domain of ZK protocols. Our approach has wide appli- 
cability as, in the last few years, much effort has been put in developing new 
programming languages that enable the generation and verification of ZK proofs 
and that also focus on the design of arithmetic circuits and the constraint encod- 
ing. Among the different solutions, we can distinguish: libraries (bellman [7], 
libsnark [29], snarky [28]), programming-focused languages (ZoKrates [12], 
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xJsnark [23], zinc [24], Leo [10]), and hardware-description languages (circom). 
As opposed to the initial library approach, both programming and hardware- 
description languages put focus on the design of arithmetic circuits and the con- 
straint encoding. In this regard, ZoKrates, xJsnark, and the circom compiler 
implement one simple but powerful R1CS-specific optimization called linearity 
reduction: it consists in substituting the linear constraints to generate a new 
circuit whose system only consists of non-linear constraints. However, they do 
not deduce new constraints to detect further redundancies in the system. Linear 
reduction is a particular case of our reduction rules in which the only linear 
constraints that can be deduced and added to the linear system are those that 
follow from linear constraints present in the constraint system. On the other 
side, the constraint system generated by Leo is only optimized at the level of its 
intermediate representation not at R1CS-level, as our method works. 

Finally, there has been a joint effort towards standardizing and allowing 
the interoperability between different programs, like CirC [26], an infrastructure 
for building compilers to logical constraint representation. Currently, CirC only 
applies the linearity reduction explained above. Recently, an interface called 
zkInterface [6] has been built to improve the interoperability among several 
frontends, like ZoKrates and snarky. It provides means to express statements 
in a high-level language and compile them into an R1CS representation; and 
several backends that implement ZK protocols like Groth16 [17] and Pinocchio 
[27] that use the R1CS representation to produce ZK proofs. zkInterface could 
benefit from our optimization to apply our reduction to every circuit generated 
by any of the accepted frontends. zkInterface is also written in Rust, then our 
optimizer could be easily integrated as a new gadget for the tool in the future. 
Finally, we believe that the techniques presented in this paper can lead us to 
new reduction schemes to be applied over PlonK [14] constraint systems. 
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Abstract. In many Internet of Things (IoT) applications, data sensed 
by an IoT device are continuously sent to the server and monitored 
against a specification. Since the data often contain sensitive informa- 
tion, and the monitored specification is usually proprietary, both must be 
kept private from the other end. We propose a protocol to conduct obliv- 
ious online monitoring—online monitoring conducted without revealing 
the private information of each party to the other—against a safety LTL 
specification. In our protocol, we first convert a safety LTL formula into a 
DFA and conduct online monitoring with the DFA. Based on fully homo- 
morphic encryption (FHE), we propose two online algorithms (REVERSE 
and BLOCK) to run a DFA obliviously. We prove the correctness and secu- 
rity of our entire protocol. We also show the scalability of our algorithms 
theoretically and empirically. Our case study shows that our algorithms 
are fast enough to monitor blood glucose levels online, demonstrating 
our protocol’s practical relevance. 


1 Introduction 


Internet of Things (IoT) [3] devices enable various service providers to monitor 
personal data of their users and to provide useful feedback to the users. For 
example, a smart home system can save lives by raising an alarm when a gas stove 
is left on to prevent a fire. Such a system is realized by the continuous monitoring 
of the data from the IoT devices in the house [8,18]. Another application of IoT 
devices is medical IoT (MIoT) [16]. In MIoT applications, biological information, 
such as electrocardiograms or blood glucose levels, is monitored, and the user is 
notified when an abnormality is detected (such as arrhythmia or hyperglycemia). 
In many IoT applications, monitoring must be conducted online, i.e., a stream 
of sensed data is continuously monitored, and the violation of the monitoring 
specification must be reported even before the entire data are obtained. In the 
smart home and MIoT applications, online monitoring is usually required, as con- 
tinuous sensing is crucial for the immediate notifications to emergency respon- 
ders, such as police officers or doctors, for the ongoing abnormal situations. 
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Online monitoring 
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Fig. 1. The proposed oblivious online LTL monitoring protocol. 
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(c) Algorithm BLock with block size B = 2. Each 
block of length B is consumed with a variant of 
(b) Algorithm REVERSE, where OFFLINE. The intermediate result at each block 
MF is the reversed DFA of M. is used in the consumption of the next block. 


Fig. 2. How our algorithms consume the data di, d2,...,dn with the DFA M. 


As specifications generally contain proprietary information or sensitive 
parameters learned from private data (e.g., with specification mining [27]), the 
specifications must be kept secret. One of the approaches for this privacy is to 
adopt the client-server model to the monitoring system. In such a model, the 
sensing device sends the collected data to a server, where the server performs 
the necessary analyses and returns the results to the device. Since the client does 
not have access to the specification, the server’s privacy is preserved. 

However, the client-server model does not inherently protect the client’s pri- 
vacy from the servers, as the data collected from and results sent back to the 
users are revealed to the servers in this model; that is to say, a user has to trust 
the server. This trust is problematic if, for example, the server itself intentionally 
or unintentionally leaks sensitive data of device users to an unauthorized party. 
Thus, we argue that a monitoring procedure should achieve the following goals: 


Online Monitoring. The monitored data need not be known beforehand. 
Client’s Privacy. The server shall not know the monitored data and results. 
Server’s Privacy. The client shall not know what property is monitored. 


We call a monitoring scheme with these properties oblivious online monitoring. 
By an oblivious online monitoring procedure, 1) a user can get a monitoring 
result hiding her sensitive data and the result itself from a server, and 2) a 
server can conduct online monitoring hiding the specification from the user. 


Contribution. In this paper, we propose a novel protocol (Fig. 1) for oblivious 
online monitoring against a specification in linear temporal logic (LTL) [33]. 
More precisely, we use a safety LTL formula [26] as a specification, which can be 
translated to a deterministic finite automaton (DFA) [36]. In our protocol, we 
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first convert a safety LTL formula into a DFA and conduct online monitoring with 
the DFA. For online and oblivious execution of a DFA, we propose two algorithms 
based on fully homomorphic encryption (FHE). FHE allows us to evaluate an 
arbitrary function over ciphertexts, and there is an FHE-based algorithm to 
evaluate a DFA obliviously [13]. However, this algorithm is limited to leveled 
homomorphic, i.e., the FHE parameters are dependent on the number of the 
monitored ciphertexts and thus not applicable to online monitoring. 

In this work, we first present a fully homomorphic offline DFA evaluation 
algorithm (OFFLINE) by extending the leveled homomorphic algorithm in [13]. 
Although we can remove the parameter dependence using this method, OFFLINE 
consumes the ciphertexts from back to front (Fig. 2a). As a result, OFFLINE is 
still limited to offline usage only. To truly enable online monitoring, we propose 
two new algorithms based on OFFLINE: REVERSE and BLOCK. In REVERSE, we 
reverse the DFA and apply OFFLINE to the reversed DFA (Fig. 2b). In BLOCK, 
we split the monitored ciphertexts into fixed-length blocks and process each 
block sequentially with OFFLINE (Fig. 2c). We prove that both of the algorithms 
have linear time complexity and constant space complexity to the length of the 
monitored ciphertexts, which guarantees the scalability of our entire protocol. 

On top of our online algorithms, we propose a protocol for oblivious online 
LTL monitoring. We assume that the client is malicious, i.e., the client can 
deviate arbitrarily from the protocol, while the server is honest-but-curious, i.e., 
the server honestly follows the protocol but tries to learn the client’s private 
data by exploiting the obtained information. We show that the privacy of both 
parties can be protected under the standard IND-CPA security of FHE schemes 
with the addition of shielded randomness leakage (SRL) security [10,21]. 

We implemented our algorithms for DFA evaluation in C++20 and evalu- 
ated their performance. Our experiment results confirm the scalability of our 
algorithms. Moreover, through a case study on blood glucose levels monitoring, 
we also show that our algorithms run fast enough for online monitoring, i.e., 
our algorithms are faster than the sampling interval of the current commercial 
devices that samples glucose levels. 

Our contributions are summarized as follows: 


— We propose two online algorithms to run a DFA obliviously. 

— We propose the first protocol for oblivious online LTL monitoring. 

— We proved the correctness and security of our protocol. 

— Our experiments show the scalability and practicality of our algorithms. 


Related Work. There are various works on DFA execution without revealing the 
monitored data (See Table 1 for a summary). However, to our knowledge, there is 
no existing work achieving all of our three goals (i.e., online monitoring, privacy 
of the client, and privacy of the server) simultaneously. Therefore, none of them 
is applicable to oblivious online LTL monitoring. 

Homomorphic encryption, which we also utilize, has been used to run a 
DFA obliviously [13,25]. Among different homomorphic encryption schemes, 
our algorithm is based on the algorithm in [13]. Although these algorithms 
guarantee the privacy of the client and the privacy of the server, all of the 
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Table 1. Related work on DFA execution with privacy of the client. 


Work [37] | [20] | [9] | [35] | [32] | [22] | [25] | [13] | [1] || Ours 


Support online monitoring xX |X |X| xX) XT XTX XK lV v 
Private the client’s monitored data ae Ae A L| LSI Ls] v 
Private DFA, except for its number of the states) V | V |¥|Y% > ¥ | Y¥ | 4%) v4 |x v 
Private DFA’s number of the states xX |X |X| xX) XT XI |W Ik v 
Performance report XI1S% |XX |X| xX] xX | xX | K v 
homomorphic-encryption-based algorithms are limited to offline DFA execution 


and do not achieve online monitoring. We note that the extension of [13] for online 
DFA execution is one of our technical contributions. 

In [1], the authors propose an LTL runtime verification algorithm without 
revealing the monitored data to the server. They propose both offline and online 
algorithms to run a DFA converted from a safety LTL formula. The main issue 
with their online algorithm is that the DFA running on the server must be 
revealed to the client, and the goal of privacy of the server is not satisfied. 

Oblivious DFA evaluation (ODFA) [9,20,22,31,35,37] is a technique to run 
a DFA on a server while keeping the DFA secret to the server and the monitored 
data secret to the client. Although the structure of the DFA is not revealed to 
the client, the client has to know the number of the states. Consequently, the 
goal privacy of the server is satisfied only partially. Moreover, to the best of our 
knowledge, none of the ODFA-based algorithms support online DFA execution. 
Therefore, the goal online monitoring is not satisfied. 


Organization. The rest of the paper is organized as follows: In (Sect.2), we 
overview LTL monitoring (Sect. 2.1), the FHE scheme we use (Sect. 2.2), and the 
leveled homomorphic offline algorithm (Sect. 2.3). Then, in Sect.3, we explain 
our fully homomorphic offline algorithm (OFFLINE) and two online algorithms 
(REVERSE and BLOCK). We describe the proposed protocol for oblivious online 
LTL monitoring in Sect. 4. After we discuss our experimental results in Sect. 5, 
we conclude our paper in Sect. 6. 


2 Preliminaries 


Notations. We denote the set of all nonnegative integers by N, the set of all 
positive integers by Nt, and the set {0,1} by B. Let X be a set. We write 2* for 
the powerset of X. We write X* for the set of finite sequences of X elements and 
X” for the set of infinite sequences of X elements. For u € X“, we write u; € X 
for the i-th element (0-based) of u, wi; E€ X* for the subsequence uj, UWi41,-.-, Uj 
of u, and u;, E€ X” for the suffix of u starting from its i-th element. For u € X* 
and v € X* UX”, we write u -v for the concatenation of u and v. 


DFA. A deterministic finite automaton (DFA) is a 5-tuple (Q, £, 6, go, F), where 
Q is a finite set of states, X is a finite set of alphabet, 6: Qx X — Q isa transition 
function, qo € Q is an initial state, and F C Q is a set of final states. If the 
alphabet of a DFA is B, we call it a binary DFA. For a state q € Q and a word 
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W = 0102...0n we define ô(q, w) = 6(...6(5(¢, 01), 02),---,On). For a DFA M 
and a word w, we write M(w) := 1 if M accepts w; otherwise, M(w) := 0. We 
also abuse the above notations for nondeterministic finite automata (NFAs). 


2.1 LTL 


We use linear temporal logic (LTL) [33] to specify the monitored properties. The 
following BNF defines the syntax of LTL formulae: ¢,Y:=T |p|- ^y] 
Xo | Uy, where ¢ and wW range over LTL formulae and p ranges over a set AP 
of atomic propositions. 

An LTL formula asserts a property of u € (24)”. The sequence u expresses 
an execution trace of a system; u; is the set of the atomic propositions satisfied at 
the i-th time step. Intuitively, T represents an always-true proposition; p asserts 
that uo contains p, and hence p holds at the 0-th step in u; ~o is the negation of 
ġ; and dAw is the conjunction of ¢ and w. The temporal proposition X@ expresses 
that ¢ holds from the next step (i.e., u1:); PUY expresses that w holds eventually 
and @ continues to hold until then. We write L for ~T; dV w for =(>¢ A ~y); 

n cle ie of X 
o = y for novy; Fo for TU¢; Go for =(F>¢); Gin mj for X...X (dA 

(m—n) occ. of X noce: af X (m—n) occ. of X 
=O ONT m oO OOO 
X(@AX(---A Xo))); and Fin mjd for X...X (PV X(GV X(--- v X))). 

We formally define the semantics of LTL below. Let u € (24P)”, i € N, and 
ġ be an LTL formula. We define the relation u,i = @ as the least relation that 
satisfies the following: 


u,t te T uiHp £ peuli) u, i = ~no ai u, i E o 
uiHpay &% wikdanduiky uih Xo £$ witlke 
u, i H Uy def there exists j > i such that u, j = % and, 


for any k, i <k <j = u,k E ọ. 


We write u = ¢ for u,0 = ¢ and say u satisfies ọ. 

In this paper, we focus on safety [26] (i.e., nothing bad happens) fragment of 
LTL properties. A finite sequence w € (24P)* is a bad prefix for an LTL formula 
if w- v 4 ¢ holds for any v € (24P)”. For any bad prefix w, we cannot extend 
w to an infinite word that satisfies ¢. An LTL formula ¢ is a safety LTL formula 
if for any w € (24")” satisfying w 4 ¢, w has a bad prefix for ¢. 

A safety monitor (or simply a monitor) is a procedure that takes w € (24P)¥ 
and a safety LTL formula ¢ and generates an alert if w jÆ ¢. From the definition 
of safety LTL, it suffices for a monitor to detect a bad prefix of ¢. It is known 
that, for any safety LTL formula ¢, we can construct a DFA Mọ recognizing the 
set of the bad prefixes of ¢ [36], which can be used as a monitor. 


2.2 Torus Fully Homomorphic Encryption 


Homomorphic encryption (HE) is a form of encryption that enables us to apply 
operations to encrypted values without decrypting them. In particular, a type 
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Table 2. Summary of TFHE ciphertexts, where N is a parameter of TFHE. 


Ciphertext Kind | Notation in this paper Plaintext Message Conversion from TRLWE 
TLWE c a Boolean value b € B SAMPLEEXTRACT (fast) 
TRLWE c a Boolean vector v € BY = 


SAMPLEEXTRACT and 


TRGSW d a Boolean value b € B 
CIRCUITBOOTSTRAPPING (slow) 


of HE, called Fully HE (FHE), allows us to evaluate arbitrary functions over 
encrypted data [11,19,23,24]. We use an instance of FHE called TFHE [13] in 
this work. We briefly summarize TFHE below; see [13] for a detailed exposition. 

We are concerned with the following two-party secure computation, where 
the involved parties are a client (called Alice) and a server (called Bob): 1) 
Alice generates the keys used during computation; 2) Alice encrypts her plain- 
text messages into ciphertexts with her keys; 3) Alice sends the ciphertexts to 
Bob; 4) Bob conducts computation over the received ciphertexts and obtains 
the encrypted result without decryption; 5) Bob sends the encrypted results to 
Alice; 6) Alice decrypts the received results and obtains the results in plaintext. 


Keys. There are three types of keys in TFHE: secret key SK, public key PK, and 
bootstrapping key BK. All of them are generated by Alice. PK is used to encrypt 
plaintext messages into ciphertexts, and SK is used to decrypt ciphertexts into 
plaintexts. Alice keeps SK private, i.e., the key is known only to herself but not 
to Bob. In contrast, PK is public and also known to Bob. BK is generated from 
SK and can be safely shared with Bob without revealing SK. BK allows Bob to 
evaluate the homomorphic operations (defined later) over the ciphertext. 


Ciphertexts. Using the public key, Alice can generate three kinds of ciphertexts 
(Table 2): TLWE (Torus Learning With Errors), TRLWE (Torus Ring Learning 
With Errors), and TRGSW (Torus Ring Gentry-Sahai-Waters). Homomorphic 
operations provided by TFHE are defined over each of the specific ciphertexts. 
We note that different ciphertexts have different data structures, and their con- 
version can be time-consuming. Table 2 shows one such example. 

In TFHE, different types of ciphertexts represent different plaintext messages. 
A TLWE ciphertext represents a Boolean value. In contrast, TRLWE represents 
a vector of Boolean values of length N, where N is a TFHE parameter. We can 
regard a TRLWE ciphertext as a vector of TLWE ciphertexts, and the conversion 
between a TRLWE ciphertext and a TLWE one is relatively easy. A TRGSW 
ciphertext also represents a Boolean value, but its data structure is quite different 
from TLWE, and the conversion from TLWE to TRGSW is slow. 

TFHE provides different encryption and decryption functions for each type 
of ciphertext. We write Enc(a) for a ciphertext of a plaintext x; Dec(c) for the 
plaintext message for the ciphertext c. We abuse these notations for all three 
types of ciphertexts. 
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Besides, TFHE supports trivial samples of TRLWE. A trivial sample of 
TRLWE has the same data structure as a TRLWE ciphertext but is not encrypted, 
i.e., anyone can tell the plaintext message represented by the trivial sample. We 
denote by TRIVIAL(n) a trivial sample of TRLWE whose plaintext message is 
(b1, b2,...,6n), where each b; is the i-th bit in the binary representation of n. 


Homomorphic Operations. TFHE provides homomorphic operations, i.e., 
operations over ciphertexts without decryption. Among the operators supported 
by TFHE [13], we use the following ones. 


CMvxX(d, Ctrue; Cfalse) : TRGSW x TRLWE x TRLWE — TRLWE 
Given a TRGSW ciphertext d and TRLWE ciphertexts Cirue, Cfalse, CMUX 
outputs a TRLWE ciphertext Cresuiz such that Dec(Cresuit) = Dec(Ctrue) if 
Dec(d) = 1, and otherwise, Dec(Cresuit) = Dec(Cfaise)- 

LooKUP({c;}?_,, {di}%,) : (TRLWE)?” x (TRGSW)” — TRLWE 
Given TRLWE ciphertexts c,,C2,...,€gn and TRGSW ciphertexts d,,d2,..., 
dn, LOOKUP outputs a TRLWE ciphertext c such that Dec(c) = Dec(c;,,) and 
k = >>, 271 x Dec(d;). 

SAMPLEEXTRACT(k,c) : N x TRLWE > TLWE 
Let Dec(c) = (bi, b2,...,b). Given k < N and a TRLWE ciphertext c, 
SAMPLEEXTRACT outputs a TLWE ciphertext c where Dec(c) = bp 41. 


Intuitively, CMUX can be regarded as a multiplexer over TRLWE ciphertexts 
with TRGSW selector input. The operation LOOKUP regards €1,C2,...,C2n as 
encrypted entries composing a LookUp Table (LUT) of depth n and d1, do,...,dy 
as inputs to the LUT. Its output is the entry selected by the LUT. LOOKUP is 
constructed by 2” — 1 CMux arranged in a tree of depth n. SAMPLEEXTRACT 
outputs the k-th element of c as TLWE. Notice that all these operations work 
over ciphertexts without decrypting them. 


Noise and Operations for Noise Reduction. In generating a TFHE cipher- 
text, we ensure its security by adding some random numbers called noise. An 
application of a TFHE operation adds noise to its output ciphertext; if the noise 
in a ciphertext becomes too large, the TFHE ciphertext cannot be correctly 
decrypted. There is a special type of operation called bootstrapping! [23], which 
reduces the noise of a TFHE ciphertext. 


BOOTSTRAPPING ZxK(c) : TLWE > TRLWE 
Given a bootstrapping key BK and a TLWE ciphertext c, BOOTSTRAPPING 
outputs a TRLWE ciphertext c where Dec(c) = (b1,b2,...,bn) and bı = 
Dec(c). Moreover, the noise of c becomes a constant that is determined by 
the parameters of TFHE and is independent of c. 
CIRCUITBOOTSTRAPPING px (c) : TLWE — TRGSW 
Given a bootstrapping key BK and a TLWE ciphertext c, CIRCUITBOOT- 
STRAPPING outputs a TRGSW ciphertext d where Dec(d) = Dec(c). The 
noise of d becomes a constant that is determined by the parameters of TFHE 
and is independent of c. 


1 Note that bootstrapping here has nothing to do with bootstrapping in statistics. 
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Algorithm 1: The leveled homomorphic offline algorithm [13]. 


Input : A binary DFA M = (Q, X = B, ô, qo, F) and TRGSW monitored ciphertexts 
dy, d2,..., dn 

Output : A TLWE ciphertext c satisfying Dec(c) = M (Dec(dı)Dec(d2)...Dec(dn)) 
1 for q E€ Q do 
2 | cn, q €F ? TriviaL(1) : TRivia(0) // Initialize each Cn,q 
3 fori=n,n—1,...,1do 
4 for q E€ Q such that q is reachable from qo by (i — 1) transitions do 
5 | Cin1,¢ — CMux(di, ¢;,5(¢,1)» Ci,5(q,0)) 
6 c+ SAMPLEEXTRACT(0, Co, a9) 
7 return c 


These bootstrapping operations are used to keep the noise of a TFHE cipher- 
text small enough to be correctly decrypted. BOOTSTRAPPING and CIRCUIT- 
BOOTSTRAPPING are almost two and three orders of magnitude slower than 
CMvx, respectively [13]. 


Parameters for TFHE. There are many parameters for TFHE, such as the 
length N of the message of a TRLWE ciphertext and the standard deviation of the 
probability distribution from which a noise is taken. Certain properties of TFHE 
depend on these parameters. These properties include the security level of TFHE, 
the number of TFHE operations that can be applied without bootstrapping ensur- 
ing correct decryption, and the time and the space complexity of each operation. 
The complete list of TFHE parameters is presented in the full version [4]. 

We remark that we need to determine the TFHE parameters before perform- 
ing any TFHE operation. Therefore, we need to know the number of applications 
of homomorphic operations without bootstrapping in advance, i.e., the homo- 
morphic circuit depth must be determined a priori. 


2.3 Leveled Homomorphic Offline Algorithm 


Chillotti et al. [13] proposed an offline algorithm to evaluate a DFA over 
TFHE ciphertexts (Algorithm 1). Given a DFA M and TRGSW ciphertexts 
dı, d2,...,dn, Algorithm 1 returns a TLWE ciphertext c satisfying Dec(c) = 
M(Dec(d,)Dec(d2)...Dec(d,)). For simplicity, for a state q of M, we write 
M*(q) for M(q, Dec(d;)Dec(d;41) ...Dec(d,)). 

In Algorithml, we use a TRLWE ciphertext c;,, whose first ele- 
ment represents M‘t1(q), i.e., whether we reach a final state by reading 
Dec(dj41)Dec(dj+2)...Dec(d,) from q. We abuse this notation for i = n, i.e., 
the first element of Cn, represents if q € F. In Lines 1 and 2, we initialize cy 4; 
For each q E€ Q, we let Cn be TRIVIAL(1) if q € F; otherwise, we let Cn,q 
be TRIVIAL(0). In Lines 3-5, we construct c;~1,q inductively by feeding each 
monitored ciphertext d; to CMux from tail to head. Here, Ci—1,q represents 
M’ (q) because of M‘(q) = M**1(6(q, Dec(d;))). We note that for the efficiency, 
we only construct Ci—1,q for the states reachable from qo by 7 — 1 transitions. 
In Line 6, we extract the first element of Cogo, which represents M+(qo), i.e., 
M(Dec(d;)Dec(d2)...Dec(d,)). 
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Algorithm 2: Our fully homomorphic offline algorithm (OFFLINE). 


Input : A binary DFA M = (Q, E = B, ô, qo, F), TRGSW monitored ciphertexts 
dı, d2,..., dn, a bootstrapping key BK, and Ipoot € Nt 
Output : A TLWE ciphertext c satisfying Dec(c) = M(Dec(d,)Dec(d2) ...Dec(d,)) 
1 for q E€ Q do 
2 | cn, q E€ F ? TriviaL(1) : TriviaL(0) 
3 fori=n,n—1,...,1do 
4 for q E€ Q such that q is reachable from qo by (i — 1) transitions do 
5 |  ci—1,4 — CMux(di, Ci 5(q,1)> Ci,8(q,0)) 
6 if (n—i+1) mod Ipoot = 0 then 
7 for q E€ Q such that reachable from qo by (i — 1) transitions do 
8 Ci—1,q = SAMPLEEXTRACT(0, Ci-1,q) 
9 Ci—1,q — BOOTSTRAPPINGBK (Ci-1,¢) 


H 
° 


c — SAMPLEEXTRACT(0, Co, qg ) 
return c 


H 
H 


Theorem 1 (Correctness |13, Thm. 5.4]). Given a binary DFA M and 
TRGSW ciphertezts dı, d2,...,dn, if c in Algorithm 1 can be correctly decrypted, 
Algorithm 1 outputs c satisfying Dec(c) = M(Dec(d,) Dec(dz) ... Dec(dn)). 


Complexity Analysis. The time complexity of Algorithm 1 is determined by 
the number of applications of CMux, which is O(n|Q]). Its space complexity 
is O(|Q|) because we can use two sets of |Q| TRLWE ciphertexts alternately for 
C2j—1,q and C2j,q (for J E N+). 


Shortcomings of Algorithm 1. We cannot use Algorithm 1 under an online 
setting due to two reasons. Firstly, Algorithm 1 is a leveled homomorphic algo- 
rithm, i.e., the maximum length of the ciphertexts that Algorithm 1 can handle 
is determined by TFHE parameters. This is because Algorithm 1 does not use 
BOOTSTRAPPING, and if the monitored ciphertexts are too long, the result c can- 
not be correctly decrypted due to the noise. This is critical in an online setting 
because we do not know the length n of the monitored ciphertexts in advance, 
and we cannot determine such parameters appropriately. 

Secondly, Algorithm 1 consumes the monitored ciphertext from back to front, 
i.e., the last ciphertext dn is used in the beginning, and dı is used in the end. 
Thus, we cannot start Algorithm 1 before the last input is given. 


3 Online Algorithms for Running DFA Obliviously 


In this section, we propose two online algorithms that run a DFA obliviously. As 
a preparation for these online algorithms, we also introduce a fully homomorphic 
offline algorithm based on Algorithm 1. 


3.1 Preparation: Fully Homomorphic Offline Algorithm (OFFLINE) 


As preparation for introducing an algorithm that can run a DFA under an online 
setting, we enhance Algorithm 1 so that we can monitor a sequence of ciphertexts 
whose length is unknown a priori. Algorithm 2 shows our fully homomorphic 
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Algorithm 3: Our first online algorithm (REVERSE). 


Input : A binary DFA M, TRGSW monitored ciphertexts d1, d2, d3,..., dn, a 
bootstrapping key BK, and Iboot E€ Nt 
Output : For every i € {1,2,...,n}, a TLWE ciphertext c; satisfying 


Dec(c;) = M(Dec(d1)Dec(d2) ... Dec(d;)) 

let MË = (QF,B, gh, qe; FR) be the minimum reversed DFA of M 
for q? € Q? do 
Co gR — q? € F? ? Triviat(1) : Triviar(0) 
for i=1,2,...,n do 
for q?” € Q? do 

| C; R = CMux(di, c; 
if i mod Ipoot = 0 then 
for q? € Q® do 
C; qR 4 SAMPLEEXTRACT(0, C; JR) 


-1,5R (4R ,1)? Ci—1,5R (qR ,0)) 


ooN OAK WN KH 


H 
° 


Ci R © BOOTSTRAPPINGBK (C; 4R) 
Ci — SAMPLEEXTRACT(O, c, g) 
ma) 


H 
H 


j 
N 


output ci 


offline algorithm (OFFLINE), which does not require TFHE parameters to depend 
on the length of the monitored ciphertexts. The key difference lies in Lines 6-9 
(the red lines) of Algorithm 2. Here, for every Jboot consumption of the monitored 
ciphertexts, we reduce the noise by applying BOOTSTRAPPING to the ciphertext 
C; j representing a state of the DFA. Since the amount of the noise accumulated 
in c; j is determined only by the number of the processed ciphertexts, we can keep 
the noise levels of c;,; low and ensure that the monitoring result c is correctly 
decrypted. Therefore, by using Algorithm 2, we can monitor an arbitrarily long 
sequence of ciphertexts as long as the interval boot is properly chosen according 
to the TFHE parameters. We note that we still cannot use Algorithm 2 for online 
monitoring because it consumes the monitored ciphertexts from back to front. 


3.2 Online Algorithm 1: REVERSE 


To run a DFA online, we modify OFFLINE so that the monitored ciphertexts are 
consumed from front to back. Our main idea is illustrated in Fig. 2b: we reverse 
the DFA M beforehand and feed the ciphertexts dı, d2,...,dn to the reversed 
DFA ME serially from dı to dp. 

Algorithm 3 shows the outline of our first online algorithm (REVERSE) based 
on the above idea. REVERSE takes the same inputs as OFFLINE: a DFA M, 
TRGSW ciphertexts d,,d2,...,dn, a bootstrapping key BK, and a positive inte- 
ger Ipoot indicating the interval of bootstrapping. In Line 1, we construct the 
minimum DFA MÈ that satisfies, for any w = o\02...0, € B*, we have 
M®(w) = M(w®), where w? = o,...01. We can construct such a DFA by 
reversing the transitions and by applying the powerset construction and the 
minimization algorithm. 

In the loop from Lines 4-12, the reversed DFA M® consumes each moni- 
tored ciphertext d;, which corresponds to the loop from Lines 3-9 in Algorithm 
2. The main difference lies in Line 5 and 8: Algorithm 3 applies CMux and 
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Algorithm 4: Our second online algorithm (BLOCK). 


Input : A binary DFA M = (Q, E = B, ô, qo, F), TRGSW monitored ciphertexts 
dı, d2, d3,...,dn, a bootstrapping key BK, and B € Nt 

Output : For every i E€ Nt (i < |n/B]), a TLWE ciphertext c; satisfying 
Dec(c;) = M(Dec(d1)Dec(d2)... Dec(dix B)) 


1 Sı — {qo} // Si: the states reachable by (i— 1) x B transitions. 
2 fori =1,2,...,|n/B| do 
3 Siti — {q E Q | Is; E€ Si. q is reachable from s; by B transitions} 
— fil „itl i+1 
// We denote Si41 = {s1 3S3 0-0 Sis; 411? 
4 for q € Q do 
5 if q € Si41 then . 
6 j + the index of Si+ı such that q = sith 
7 cha — TRrIvaL((j—1)x2+(q4EF?1:0)) 
8 for k = B, B — 1,...,1 do 
9 for q € Q such that q is reachable from a state in S; by (k — 1) transitions do 
Ti CMux(dy Ti Ti 
10 Choig m UX(d(i—1)B+k> Ck (4,1)? Ck ë(q,0)) 
11 if |S;| = 1 then 
12 cw co’, where S; = {q} 
13 else 
14 for l =1,2,..., [log(|Si|)] do 
15 cy — SAMPLEEXTRACT(I, cf") 
16 d, — CIRCUITBOOTSTRAPPINGpgk (cz) 
Ti Ti Ti 
17 cri LOORURL ES oi ; Casi yega Ciets {dipsa Fog (15411 }) 
i 
18 Ci + SAMPLEEXTRACT(0, c74) 
19 output ci 


BOOTSTRAPPING to all the states of MÈ, while Algorithm 2 only considers the 
states reachable from the initial state. This is because in online monitoring, we 
monitor a stream of ciphertexts without knowing the number of the remaining 
ciphertexts, and all the states of the reversed DFA MÈ are potentially reachable 
from the initial state qÈ by the reversed remaining ciphertexts dy,dn—1,..., di+1 
because of the minimality of MÈ. 


Theorem 2. Given a binary DFA M, TRGSW ciphertezts dı, d2,..., dn, a boot- 
strapping key BK, and a positive integer Inoot, for each i € {1,2,...,n}, if 
ci in Algorithm 3 can be correctly decrypted, Algorithm 3 outputs ci satisfying 
Dec(c;) = M(Dec(dı)Dec(d2) . . . Dec(d;)). 


Proof (sketch). SAMPLEEXTRACT and BOOTSTRAPPING in Line 9 and 10 do not 
change the decrypted value of c;. Therefore, Dec(c;) = MP (Dec(d;) . . . Dec(d1)) 
for i € {1,2,...,n} by Theorem 1. As MÈ is the reversed DFA of M, we have 
Dec(c;) = M®(Dec(d;) . . . Dec(d1)) = M(Dec(d1) . . . Dec(d;)). 


3.3 Online Algorithm 2: BLOCK 


A problem of REVERSE is that the number of the states of the reversed DFA 
can explode exponentially due to powerset construction (see Sect. 3.4 for the 
details). Another idea of an online algorithm without reversing a DFA is illus- 
trated in Fig. 2c: we split the monitored ciphertexts into blocks of fixed size B 
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and process each block in the same way as Algorithm 2. Intuitively, for each block 
di+(i—1)x B; da4(é-1)x By +++, 4B+(i-1)x B Of ciphertexts, we compute the function 
T;: Q > Q satisfying T;(q) = ô(q, di4.¢—1)xB, d24+(i-1)x By +++» dB4+(i-1)xB) by a 
variant of OFFLINE, and keep track of the current state of the DFA after reading 
the current prefix d1, d2,.. . , dB4(i—1)x B- 

Algorithm 4 shows the outline of our second online algorithm (BLOCK) 
based on the above idea. Algorithm 4 takes a DFA M, TRGSW ciphertexts 
dı, d2, ..., dn, a bootstrapping key BK, and an integer B representing the inter- 
val of output. To simplify the presentation, we make the following assumptions, 
which are relaxed later: 1) B is small, and a trivial TRLWE sample can be cor- 
rectly decrypted after B applications of CMUX; 2) the size |Q| of the states of 
the DFA M is smaller than or equal to 271, where N is the length of TRLWE. 

The main loop of the algorithm is sketched on Lines 2-19. In each iteration, we 
consume the i-th block consisting of B ciphertexts, i.e., d_1)B41,---,@G—1)B+B- 
In Line 3, we compute the set $41 = {si*', sit"... it ay of the states reach- 
able from qo by reading a word of length i x B. 

In Lines 4-10, for each q € Q, we construct a ciphertext representing T;(q) by 
feeding the a block to a variant of OFFLINE. More precisely, we construct 
a ciphertext ca ‘, representing the pair of the Boolean value showing if T;(q) € F 
and the state Ti(q ) € Q. The encoding of such a pair in a TRLWE ciphertext is 
as follows: the first element shows if T;(q) € F and the other elements are the 
binary representation of j € N+, where j is such that stl = T;(q). 

In Lines 11-17, we construct the ciphertext cf", representing the state of 
the DFA M after reading the current prefix d;,d2,...,dB+4(i-1)xB- If |S;| = 1, 
since the unique element q of S; is the only possible state before consuming the 
current block, the state after reading it is T(q). Therefore, we let cf, = coi z 

Otherwise, we extract the ciphers representing the state q before consum- 
ing the current block, and let cf}, = ca ‘7 Since the c{"* (except for the first ele- 
ment) represents q (see Line 7), we extract them by aping SAMPLEEXTRACT 
(Line 15) and convert them to TRGSW by applying CIRCUITBOOTSTRAPPING 
(Line 16). Then, we choose coy by applying LOOKUP and set it to cf). 

The output after consuming the current block, i.e., M(Dec(d;)Dec(dg)... 
Dec(d(;-1)B+8)) is stored in the first element of the TRLWE ciphertext cf). It 
is extracted by applying SAMPLEEXTRACT in Line 18 and output in Linel9. 


Theorem 3. Given a binary DFA M, TRGSW ciphertezts dı, d2,...,dn, a boot- 
strapping key BK, and a positive integer B, for each i € {1,2,...,|n/B]}, if ci 
in Algorithm 4 can be correctly decrypted, Algorithm 4 outputs a TLWE ciphertext 
ci satisfying Dec(c;) = M(Dec(d,) Dec(d2) ... Dec(d;xB)). 


Proof (sketch). Let qt be 5(qo, Dec(di)Dec(d2) ... Dec(d;x 8)). It suffices to show 
that, for each iteration i in Line 2, Dec(c§}",) represents a pair of the Boolean 
value showing if g’ € F and the state g’ € Q in the above encoding format. 
This is because c; represents the first element of c§\",. Algorithm 4 selects cf 
from {co',}ae gs, in Line 12 or Line 17. By using a slight variant of Theorem 1 in 
Lines 11-17, we can show that 9’, represents if T’ (q) € F and the state T’ (q). 


Therefore, the proof is completed by showing Dec(c§\",) = Dec(c?' Cgi: ): 
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Table 3. Complexity of the proposed algorithms with respect to the number |Q| of 
the states of the DFA and the size |¢| of the LTL formula. For BLOCK, we show the 
complexity before the relaxation. 


; Number of Applications 
Algorithm w.r.t. Space 
CMux BOOTSTRAPPING CIRCUITBOOTSTRAPPING 

DFA | O(n|Q|)  O(n|Q|/Inoot) — O(|Q|) 

OFFLINE gi¢l all i glel 
LTL O(n2f ) O(n2* /Tpoot) — O(2°) 

IQI IQI = IQI 

REVERSE DFA | O(n2'™!)  O(n2!™!/Iboot) O(2'*!) 
LTL | O(n2!*!) — O(n2!#! /Tyoot) -< o(2'#!) 

DFA O(n|Q|) = O((n log |Q|)/B) O(|Q|) 

BLOCK all iel 2/4 
LTL O(n2° ) — O(n2'”!/B) O(2°) 
We prove Dec(c§,) = Dec(¢¢* .-1) by induction on i. If i = 1, |S;| = 1 
holds, and by q'~! € Sj, we have Dec(c§"",) = Dee(cG},:-1)- Ifi > 1 and |S;| = 1, 


Dec(c§,) = Dee(co* «-1) holds similarly. If i > 1 and |S;| > 1, by induction 
hypothesis, Dec(c$"") represents if T;_1(q’~*) = gq’! € F and the state q’}. 
By construction in Line 16, Dec(d}) is equal to the l-th bit of (j — 1), where j is 
such that si = q'~*. Therefore, the result of the application of LOOKUP in Line 


: : T; T; Ty 
17 is equivalent to Cosi (= Co\gi-1)s and we have Dec(c§!,) = Dec(co',:-1)- 


We note that BLOCK generates output for every B monitored ciphertexts 
while REVERSE generates output for every monitored ciphertext. 

We also remark that when B = 1, BLOCK consumes every monitored cipher- 
text from front to back. However, such a setting is slow due to a huge number 
of CIRCUITBOOTSTRAPPING operations, as pointed out in Sect. 3.4. 


Relaxations of the Assumptions. When B is too large, co may not be 
correctly decrypted. We can relax this restriction by inserting BOOTSTRAPPING 
just after Line 10, which is much like Algorithm 2. When the size |Q] of the states 
of the DFA M is larger than 271, we cannot store the index j of the state using 
one TRLWE ciphertext (Line 7). We can relax this restriction by using multiple 


TRLWE ciphertexts for ciy and cf"). 


3.4 Complexity Analysis 


Table 3 summarizes the complexity of our algorithms with respect to both the 
number |Q| of the states of the DFA and the size |¢| of the LTL formula. We 
note that, for BLOCK, we do not relax the above assumptions for simplicity. 
Notice that the number of applications of the homomorphic operations is linear 
to the length n of the monitored ciphertext. Moreover, the space complexity is 
independent of n. This shows that our algorithms satisfy the properties essential 
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to good online monitoring; 1) they only store the minimum of data, and 2) they 
run quickly enough under a real-time setting [5]. 

The time and the space complexity of OFFLINE and BLOCK are linear to |Q]. 
Moreover, in these algorithms, when the i-th monitored ciphertext is consumed, 
only the states reachable by a word of length 7 are considered, which often makes 
the scalability even better. In contrast, the time and the space complexity of 
REVERSE is exponential to |Q|. This is because of the worst-case size of the 
reversed DFA due to the powerset construction. Since the size of the reversed 
DFA is usually reasonably small, the practical scalability of REVERSE is also 
much better, which is observed through the experiments in Sect. 5. 

For OFFLINE and BLOCK, |Q| is doubly exponential to |¢| because we first 
convert ¢ to an NFA (one exponential) and then construct a DFA from the 
NFA (second exponential). In contrast, for REVERSE, it is known that we can 
construct a reversed DFA for ¢ of the size of at most singly exponential to 
|¢| [15]. Note that, in a practical scenario exemplified in Sect. 5, the size of the 
DFA constructed from ¢ is expected to be much smaller than the worst one. 


4 Oblivious Online LTL Monitoring 


In this section, we formalize the scheme of oblivious online LTL monitoring. We 
consider a two-party setting with a client and a server and refer to the client 
and the server as Alice and Bob, respectively. Here, we assume that Alice has 
private data sequence w = 0103... On to be monitored where g; € 24? for each 
i > 1. Meanwhile, Bob has a private LTL formula ¢. The purpose of oblivious 
online LTL monitoring is to let Alice know if o102...0; = @ for each i > 1, 
while keeping the privacy of Alice and Bob. 


4.1 Threat Model 


We assume that Alice is malicious, i.e., Alice can deviate arbitrarily from the pro- 
tocol to try to learn ¢. We also assume that Bob is honest-but-curious, i.e., Bob cor- 
rectly follows the protocol, but he tries to learn w from the information he obtains 
from the protocol execution. We do not assume that Bob is malicious in the present 
paper; a protocol that is secure against malicious Bob requires more sophisticated 
primitives such as zero-knowledge proofs and is left as future work. 


Public and Private Data. We assume that the TFHE parameters, the parameters 
of our algorithms (e.g., Iboot and B), Alice’s public key PK, and Alice’s boot- 
strapping key BK are public to both parties. The input w and the monitoring 
result are private for Alice, and the LTL formula ¢ is private for Bob. 


4.2 Protocol Flow 


The protocol flow of oblivious online LTL monitoring is shown in Fig. 3. It takes 
01,02,---,0n, Q, and b € B as its parameters, where b is a flag that indicates the 
algorithm Bob uses: REVERSE (b = 0) or BLOCK (b = 1). After generating her 


Oblivious Online Monitoring for Safety LTL Specification 461 


Input : Alice’s private inputs 01,02,...,0n E 2AP Bob’s private LTL formula ¢, and 
bEB 
Output : For every i € {1,2,... n}, Alice’s private output representing 0102... 0i |= $ 


1 Alice generates her secret key SK. 

2 Alice generates her public key PK and bootstrapping key BK from SK. 

3 Alice sends PK and BK to Bob. 

a Bob converts ¢ to a binary DFA M = (Q, X = B, 6, qo, F). 

5 fori=1,2,...,n do 

6 Alice encodes ø; to a sequence o! := (o/j,0/2,... al a) e BIAPI, 

7 Alice calculates d; := (Enc(o’}), Enc(o’?),... Enc(o’|A?! JJe 

8 Alice sends d; to Bob. 

9 Bob feeds the elements of d; to REVERSE (if b = 0) or Brock (if b = 1). 

// o1 -03+ +0; refers ot Lo APII Gt lAPl gil AP! y 
10 Bob obtains the output TLWE ciphertext c produced by the algorithm, where 
Dec(c) = M (01 03+ 04). 


H 
H 


Bob randomizes c to obtain c’ so that Dec(c) = Dec(c’). 
Bob sends c’ to Alice. 
Alice calculates Dec(c’) to obtain the result in plaintext. 


H H 
o N 


Fig. 3. Protocol of oblivious online LTL monitoring. 


secret key and sending the corresponding public and bootstrapping key to Bob 
(Lines 1-3), Alice encrypts her inputs into ciphertexts and sends the ciphertexts 
to Bob one by one (Lines 5-8). In contrast, Bob first converts his LTL formula 
¢ to a binary DFA M (Line 4). Then, Bob serially feeds the received ciphertexts 
from Alice to REVERSE or BLOCK (Line 9) and returns the encrypted output of 
the algorithm to Alice (Lines 10-13). 

Note that, although the alphabet of a DFA constructed from an LTL formula 
is 24P [36], our proposed algorithms require a binary DFA. Thus, in Line 4, we 
convert the DFA constructed from ¢ to a binary DFA M by inserting auxiliary 
states. Besides, in Line 6, we encode an observation g; € 24P by a sequence = 
(oo suas ol API) € BI4?! such that p; € c; if and only if o'! is true, where 
AP = {pi,.--,Pjap|}. We also note that, taking this encoding into account, we 
need to properly set the parameters for BLOCK to generate an output for each 
|AP|-size block of Alice’s inputs, i.e., B is taken to be equal to |AP]. 

Here, we provide brief sketches of the correctness and security analysis of the 
proposed protocol. See the full version [4] for detailed explanations and proofs. 


Correctness. We can show that Alice obtains correct results in our protocol 
directly by Theorem 2 and Theorem 3. 


Security. Intuitively, after the execution of the protocol described in Fig. 3, 
Alice should learn M(o{-o4---0/) for every i € {1,2,...,n} but nothing else. 
Besides, Bob should learn the input size n but nothing else. 


Privacy for Alice. We observe that Bob only obtains Enc(o’) from Alice for each 
i € {1,2,...,n} and j € {1,2,...,|AP|}. Therefore, we need to show that Bob 
learns nothing from the ciphertexts generated by Alice. Since TFHE provides 
IND-CPA security [7], we can easily guarantee the client’s privacy for Alice. 


Privacy for Bob. The privacy guarantee for Bob is more complex than that 
for Alice. Here, Alice obtains o/,05,...,0/, and the results M (o1 - oh- +- 04) for 
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every i € {1,2,...,n} in plaintext. In the protocol (Fig. 3), Alice does not obtain 
b, M themselves or their sizes, and it is known that a finite number of checking 
M(w) cannot uniquely identify M if any additional information (e.g., |M]) is 
not given [2,32]. Thus, it is impossible for Alice to identify M (or ¢) from the 
input/output pairs. 

Nonetheless, to fully guarantee the model privacy of Bob, we also need to 
show that, when Alice inspects the result ciphertext c’, it is impossible for Alice 
to know Bob’s specification, i.e., what homomorphic operations were applied by 
Bob to obtain c’. A TLWE ciphertext contains a random nonce and a noise term. 
By randomizing c properly in Line 11, we ensure that the random nonce of c’ 
is not biased [34]. By assuming SRL security [10,21] over TFHE, we can ensure 
that there is no information leakage regarding Bob’s specifications through the 
noise bias. A more detailed discussion is in the full version [4]. 


5 Experiments 


We experimentally evaluated the proposed algorithms (REVERSE and BLOCK) 
and protocol. We pose the following two research questions: 


RQ1. Are the proposed algorithms scalable with respect to the size of the mon- 
itored ciphertexts and that of the DFA? 

RQ2. Are the proposed algorithms fast enough in a realistic monitoring sce- 
nario? 

RQ3. Does a standard IoT device have sufficient computational power acting 
as a client in the proposed protocol? 


To answer RQ1, we conducted an experiment with our original benchmark where 
the length of the monitored ciphertexts and the size of the DFA are configurable 
(Sect. 5.1). To answer RQ2 and RQ3, we conducted a case study on blood glucose 
monitoring; we monitored blood glucose data obtained by simglucose? against 
specifications taken from [12,38] (Sect.5.2). To answer RQ3, we measured the 
time spent on the encryption of plaintexts, which is the heaviest task for a client 
during the execution of the online protocol. 

We implemented our algorithms in C++20. Our implementation is publicly 
available*. We used Spot |17] to convert a safety LTL formula to a DFA. We also 
used a Spot’s utility program 1t1filt to calculate the size of an LTL formula‘. 
We used TFHEpp [30] as the TFHE library. We used N = 1024 as the size of the 
message represented by one TRLWE ciphertext, which is a parameter of TFHE. 
The complete TFHE parameters we used are shown in the full version [4]. 

For RQ1 and RQ2, we ran experiments on a workstation with Intel Xeon 
Silver 4216 (3.2GHz; 32 cores and 64 threads in total), 128GiB RAM, and 
Ubuntu 20.04.2 LTS. We ran each instance of the experiment setting five times 


? https: //github.com/jxx123/simglucose. 

3 Our implementation is uploaded to https: //doi.org/10.5281 /zenodo.6558657.. 

4 We desugared a formula by 1t1£ilt with option --unabbreviate="eFGiMR*W" and 
counted the number of the characters. 
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Fig. 4. Experimental results of Mm. The left figure shows runtimes when the number 
of states (i.e., m) is fixed to 500, while the right one is when the number of monitored 
ciphertexts (i.e., n) is fixed to 50000. 


and reported the average. We measured the time to consume all of the monitored 
ciphertexts in the main loop of each algorithm, i.e., in Lines 4-12 in REVERSE 
and in Lines 2-19 in BLOCK. 

For RQ3, we ran experiments on two single-board computers with and with- 
out Advanced Encryption Standard (AES) [14] hardware accelerator. ROCK64 
has ARM Cortex A53 CPU cores (1.5 GHz; 4 cores) with AES hardware acceler- 
ator and 4 GiB RAM. Raspberry Pi 4 has ARM Cortex A72 CPU cores (1.5 GHz; 
4 cores) without AES hardware accelerator and 4 GiB RAM. 


5.1 RQI1: Scalability 


Experimental Setup. In the experiments to answer RQ1, we used a simple 
binary DFA Mm, which accepts a word w if and only if the number of the 
appearance of 1 in w is a multiple of m. The number of the states of Mm is m. 

Our experiments are twofold. In the first experiment, we fixed the DFA size 
m to 500 and increased the size n of the input word w from 10000 to 50000. In 
the second experiment, we fixed n = 50000 and changed m from 10 to 500. The 
parameters we used are boot = 30000 and B = 150. 


Results and Discussion. Figure 4 shows the results of the experiments. In the 
left plot of Fig. 4, we observe that the runtimes of both algorithms are linear 
to the length of the monitored ciphertexts. This coincides with the complexity 
analysis in Sect. 3.4. 

In the right plot of Fig. 4, we observe that the runtimes of both algorithms 
are at most linear to the number of the states. For BLOCK, this coincides with 
the complexity analysis in Sect. 3.4. In contrast, this is much more efficient than 
the exponential complexity of REVERSE with respect to |Q|. This is because the 
size of the reversed DFA does not increase. 

In both plots of Fig.4, we observe that REVERSE is faster than BLOCK. 
Moreover, in the left plot of Fig. 4, the curve of BLOCK is steeper than that of 
REVERSE. This is because 1) the reversed DFA MÈ has the same size as Mm, 
2) CIRCUITBOOTSTRAPPING is about ten times slower than BOOTSTRAPPING, 
and 3) Ipoot is much larger than B. 

Overall, our experiment results confirm the complexity analysis in Sect. 3.4. 
Moreover, the practical scalability of REVERSE with respect to the DFA size 
is much better than the worst case, at least for this benchmark. Therefore, we 
answer RQ1 affirmatively. 
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5.2 RQ2 and RQ3: Case Study on Blood Glucose Monitoring 


Experimental Setup. To answer RQ2, we applied REVERSE and BLOCK to 
the monitoring of blood glucose levels. The monitored values are generated by 
simulation of type 1 diabetes patients. We used the LTL formulae in Table 4. 
These formulae are originally presented as signal temporal logic [28] formulae [12, 
38], and we obtained the LTL formulae in Table4 by discrete sampling. 

To simulate blood glucose levels of type 1 diabetes patients, we adopted 
simglucose, which is a Python implementation of UVA/Padova Type 1 Diabetes 
Simulator [29]. We recorded the blood glucose levels every one minute? and 
encoded each of them in nine bits. For %1, Y2, Y4, we used 720 min of the sim- 
ulated values. For 1, 64,5, we used seven days of the values. The parameters 
we used are Ipoot = 30000, B = 9. 

To answer RQ3, we encrypted plaintexts into TRGSW ciphertexts 1000 times 
using two single-board computers (ROCK64 and Raspberry Pi 4) and reported 
the average runtime. 


Results and Discussion (RQ2). The results of the experiments are shown in 
Table 5. The result for Y4 with REVERSE is missing because the reversed DFA 
for %4 is too huge, and its construction was aborted due to the memory limit. 

Although the size of the reversed DFA was large for pı and qe, in all the 
cases, we observe that both REVERSE and BLOCK took at most 24s to process 
each blood glucose value on average. This is partly because |Q| and |Q®| are 
not so large in comparison with the upper bound described in Sect. 3.4, i.e., 
doubly or singly exponential to |¢|, respectively. Since each value is recorded 
every one minute, at least on average, both algorithms finished processing each 
value before the next measured value arrived, i.e., any congestion did not occur. 
Therefore, our experiment results confirm that, in a practical scenario of blood 
glucose monitoring, both of our proposed algorithms are fast enough to be used 
in the online setting, and we answer RQ2 affirmatively. 

We also observe that average runtimes of ~1,W2,%4 and ¢1,¢4,¢5 with 
BLOCK are comparable, although the monitoring DFA of 1, 2,4 are sig- 
nificantly larger than those of 1,4,5. This is because the numbers of the 
reachable states during execution are similar among these cases (from 1 up to 27 
states). As we mentioned in Sect. 3.4, BLOCK only considers the states reachable 
by a word of length i when the i-th monitored ciphertext is consumed, and thus, 
it ran much faster even if the monitoring DFA is large. 


Results and Discussion (RQ3). It took 40.41 and 1470.33 ms on average to 
encrypt a value of blood glucose (i.e., nine bits) on ROCK64 and Raspberry Pi 
4, respectively. Since each value is sampled every one minute, our experiment 
results confirm that both machines are fast enough to be used in an online 
setting. Therefore, we answer RQ3 affirmatively. 


5 Current continuous glucose monitors (e.g., Dexcom G4 PLATINUM) record blood 
glucose levels every few minutes, and our sampling interval is realistic. 
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Table 4. The safety LTL formulae used in our experiments. 71, Y2, Y4 are originally 
from [12], and ¢1, ¢4, and ¢ġs are originally from [38]. 


LTL formula 


pı Gi100,700] (Ps V po V (pa A pr) V (ps A p7) V (pe A p7) V (p2 A ps A^ p7)) 

we Gi100,700](—p9 V (“p7 A aps) V (aps A =p6 A aps) V (apa A mp6 A aps) V 
(maps A mp6 ^ aps) V (mp2 A ap6 A aps) V (api A ape A aps)) 

Wa Gy600,700]((=ps A =p9) V (mp7 A mp9) V (apa A aps A ape A mp9) V (api A 

p2 A mp3 A aps A ape ^ =p9)) 

Qı G((=ps A ~=p7 A^ ps A apo) V (=ps A =ap7 ^ ps A po) V (mps A apa A npr ^ 
ps A =po) V (p4 A p7 \ aps Ampo) V (ps A p7 Amps \ aps) V (pe A p7 A aps ^ 
apo) V (pı A p2 ^ p3 ^ p7 A nps ^ ap9)) 

oa G((>p7 A aps A apo) F(0,25](P7 V ps V po)) 

os G(po V (p3 A p7 A ps 


) V (pa A p7 A ps) V (ps ^ p7 A ps) V (pe A p7 ^A ps) => 
Fio,25] (ps ^A po) V ( 


p7 A apg) V (mp3 A apa A aps ^A =ps A ap9))) 


Table 5. Experimental results of blood glucose monitoring, where Q is the state space 
of the monitoring DFA and Q* is the state space of the reversed DFA. 


Formula ¢ |¢l IQI IQF] # of blood glucose values Algorithm | Runtime (s) | Mean Runtime (ms/value) 
EVERSE 16021. 22220.62 
Yı 40963 10524 2712974 721 PEVERSE 6921:06 0:6 
BLOCK 132.68 184.02 
REVERSE 17035.05 23626.97 
W2 75220 11126 2885376 721 as 
BLOCK 131.53 182.43 
Wa 10392 7026 — 721 REVERSE = m 
BLOCK 35.42 49.12 
REVERSE 22.33 2.21 
Qı 195 21 20 10081 
BLocK 1741.15 172.72 
REVERSE 42.23 4.19 
Qa 494 237 237 10081 
BLOCK 2073.45 205.68 
REVERSE 54.87 5.44 
bs 1719 390 390 10081 
BLOCK 2084.50 206.78 


We also observe that encryption on ROCK64 is more than 35 times faster 
than that on Raspberry Pi 4. This is mainly because of the hardware accelerator 
for AES, which is used in TFHEpp to generate TRGSW ciphertexts. 


6 Conclusion 


We presented the first oblivious online LTL monitoring protocol up to our knowl- 
edge. Our protocol allows online LTL monitoring concealing 1) the client’s mon- 
itored inputs from the server and 2) the server’s LTL specification from the 
client. We proposed two online algorithms (REVERSE and BLOCK) using an FHE 
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scheme called TFHE. In addition to the complexity analysis, we experimentally 
confirmed the scalability and practicality of our algorithms with an artificial 
benchmark and a case study on blood glucose level monitoring. 

Our immediate future work is to extend our approaches to LTL semantics 
with multiple values, e.g., LTL3 [6]. Extension to monitoring continuous-time 
signals, e.g., against an STL [28] formula, is also future work. Another future 
direction is to conduct a more realistic case study of our framework with actual 
IoT devices. 
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Abstract. The analysis of legacy systems requires the automated 
extraction of high-level specifications. We propose a framework, called 
Abstraction Modulo Stability, for the analysis of transition systems oper- 
ating in stable states, and responding with run-to-completion transac- 
tions to external stimuli. The abstraction captures the effects of external 
stimuli on the system state, and describes it in the form of a finite state 
machine. This approach is parametric on a set of predicates of interest 
and the definition of stability. We consider some possible stability defini- 
tions which yield different practically relevant abstractions, and propose 
a parametric algorithm for abstraction computation. The obtained FSM 
is extended with guards and effects on a given set of variables of interest. 
The framework is evaluated in terms of expressivity and adequacy within 
an industrial project with the Italian Railway Network, on reverse engi- 
neering tasks of relay-based interlocking circuits to extract specifications 
for a computer-based reimplementation. 


Keywords: Timed Transition Systems - Property extraction - 
Simulations - Relay-based circuits 


1 Introduction 


The maintenance of legacy systems is known to be a very costly task, and the lack 
of knowledge hampers the possibility of a reimplementation with more modern 
technologies. Legacy systems may have been actively operating for decades, but 
their behavior is known only to a handful of people. It is therefore important to 
have automated means to reverse-engineer and understand their behavior, for 
example in the form of state machines or temporal properties. 

We focus on understanding systems that exhibit self-stabilizing behaviors, i.e. 
that are typically in a stable state, and respond to external stimuli by reaching 
stability in a possibly different state. As an industrially relevant example, con- 
sider legacy Railway Interlocking Systems based on Relay technology (RRIS): 
these are electro-mechanical circuits for the control of railway stations, with 
thousands of components that respond to the requests of human operators to 
activate the shunting routes for the movement of the trains. They support a 
computational model based on “run-to-completion”, where a change in a part 
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of the circuit (e.g. a switch closing) may change the power in another part of 
the circuit, and in turn operate other switches, until a stable condition is (hope- 
fully) reached. This is very different in spirit from typical “cycle-based” control 
implemented in computer-based systems such as SCADA. 

In this paper, we tackle the problem of extracting abstract specifications of 
the possible behaviors of an infinite-state timed transition system. The idea is 
to understand how the system evolves from a stable state, in response to a given 
stimulus, to the next stable state. In addition, we are interested in knowing 
under which conditions the transitions are possible and which are the effects 
on selected state variables. All this information is presented in the form of an 
extended finite state machine, which can be seen as a collection of temporal 
specifications satisfied by the system. 

We make the following contributions. First, we propose the general framework 
of Abstraction Modulo Stability, a white-box analysis of self-stabilizing systems 
with run-to-completion behavior. The set of abstract states is the grid induced by 
a set of given predicates of interest. The framework is generic and parameterized 
with respect to the notion of stability. Different notions of stability are possible, 
depending on several factors: remaining in a region is possible (for some paths) 
or necessary (for all paths); whether the horizon of persistence in the stable 
region is unbounded, or lower-bounded on the number of discrete transitions 
and/or on the actual time. The framework also takes into account the notion 
of reachability in the concrete space, in order to limit the amount of spurious 
behaviors in the abstract description. We illustrate the relations holding between 
the corresponding abstractions, depending on the strength of the selected notion 
of stability. 

Second, we present a practical algorithm to compute stability abstractions. 
We face two key difficulties. In the general case, one abstract transition is asso- 
ciated to a sequence of concrete transitions, of possibly unbounded length, so 
that a fix point must be reached. Furthermore, we need to make sure that the 
sequence is starting from a reachable state. Contrast this with the standard 
SMT-based computation of predicate abstractions [15], where one transition in 
the abstract space corresponds to one concrete transition, and reachability is not 
considered. 

Third, we show how to lift to the abstract space other relevant variables from 
the concrete space, so that each abstract transition is associated with guards and 
effects. This results in a richer abstraction where the abstract states (typically 
representing control modes) are complemented by information on the data flow 
of the additional variables (typically representing the actual control conditions 
in a given mode). 

We experimentally evaluate the approach on several large RRIS implement- 
ing the control logic for shunting routes and switch controls. This research is 
strongly motivated by an ongoing activity on the migration of the Italian Rail- 
way Network from relay-based interlocking to computer-based interlocking [3]. 
Stability abstraction is the chosen formalism to reverse engineer the RRIS, and 
to automatically provide the actual specifications for computer-based interlock- 
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ing. We demonstrate the effectiveness of the proposed algorithms, and the crucial 
role of reachability in terms of precision of the abstractions. 


Related Works. This work has substantial differences with most of the lit- 
erature in abstraction. For example, Predicate Abstraction (PA) [11] can be 
directly embedded within the framework; furthermore, PA does not take into 
account concrete reachability; finally, an abstract transition is the direct result 
of a concrete transition, and not, as in our case, of a sequence of concrete tran- 
sitions. 

In [5] the authors propose to analyze abstract transitions between invariant 
regions with an approximated approach. In comparison, we propose a general 
framework, parameterized on the notion of stability. Additionally, we propose 
effective algorithms to construct automata from concrete behaviors only, and 
that represent symbolically the guards and the effects of the transitions. 

The idea of weak bisimilarity [19], proposed for the comparison of observable 
behaviors of CCS, is based on collapsing sequences of silent, internal actions. 
The main difference with our approach is that weak bisimilarity is not used 
to obtain an abstraction for reverse engineering. Furthermore, in Abstraction 
Modulo Stability, observability is a property of states, and the silent actions are 
collapsed only when passing through unobservable (i.e., unstable) states. 

Somewhat related are the techniques for specification mining, that have 
been extensively studied, for example in hardware and software. For example, 
DAIKON [9] extracts candidate invariant specifications from simulations. In our 
approach, the abstraction directly results in temporal properties that are guar- 
anteed to hold on the system being abstracted. Yet, simulation-based techniques 
might be useful to bootstrap the computation of Abstraction Modulo Stability. 

The work in [1] proposes techniques for the analysis of RRIS, assuming that 
a description of the stable states is already given. There are two key differences: 
first, the analysis of transient states is not considered; second, the extraction of 
a description in terms of stable states is a manual (and thus inefficient and error 
prone) task. For completeness, we mention the vast literature on the application 
of formal methods to railways interlocking systems (see e.g. [6,12,13,17,18]). 
Aside from the similarity in the application domain, these works are not directly 
related, given their focus on the verification of the control algorithms. 


Structure of the Paper. In Sect.2 we present the background notions. In 
Sect.3 we present the framework of Abstraction Modulo Stability. In Sect. 4 
we present the algorithms for computing abstraction. In Sect.5 we present the 
experimental evaluation. In Sect.6 we draw some conclusions and present the 
directions of future work. 


2 Background 


We work in the setting of Satisfiability Modulo Theories (SMT) [4], with 
quantifier-free first order formulae interpreted over the theory of Linear Real 
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Arithmetic (LRA). We use P,Q to denote sets of Boolean variables, p,q to 
denote truth assignments, and the standard Boolean connectives A, V, =, — for 
conjunction, disjunction, negation and implication. T and L define true and 
false respectively. For a set of variables V, let Wr(V) denote the set of first-order 
formulae over a theory 7 with free variables in V. When clear from context we 
omit the subscript. Let V’ = {v | v € V}. For a formula ¢ € W(V), let ¢’ denote 
g[V/V"], i.e. the substitution of each variable v € V with v’. 

A finite state automaton is a tuple A = (Q, L, Qo, R) where: Q is a finite set 
of states; L is the alphabet; Qo C Q is the set of initial states; R C (Q x L x Q) 
is the labeled transition relation. We also consider automata with transitions 
annotated by guards and effects expressed as SMT formulae over given sets 


of variables. For (qi, ¢,q2) E€ R, we write qı Ea q2. Let A; and Ag be two 
automata defined on the same set of states Q and on the same alphabet L 
including a label 7: we say that A, weakly simulates Az, and we write A; < Ag, 
if whenever q L, A, 7, then q Ea Aa Leh q', where Z," isa (possibly null) 
sequence of transitions labeled with rT. 

A symbolic timed transition system is a tuple M = (V,C, X, Init, Invar, 
Trans), where: V is a finite set of state variables; C C V is a set of clock variables; 
X is a finite set of boolean variables encoding the alphabet; Init(V), Invar(V), 
Trans(V, X, V’) are SMT formulae describing the initial states, the invariant and 
the transition relation respectively. The clocks in C are real-valued variables. We 
restrict the formulae over clock variables to atoms of the form cm k, for c € C, 
k € R and me {<,<,>,>,=}. The clock invariants are convex. We allow the 
other variables in V to be either boolean or real-valued. 

A state is an assignment for the V state variables, and let S denote the set of 
all the interpretations of V. We assume a distinguished clock variable time € C 
initialized with time = 0 in Init, representing the global time. 

The system evolves following either a discrete or a timed step. The timed 
transition entails that there exists ô € R} such that c’ = c+ 6 for each clock 
variable c € C, and v’ = v for all the other variables!. The discrete transition 
entails that time’ = time and can change the other variables instantaneously. 

A valid trace 7 is a sequence of states (so, $1,...) that all fulfill the Invar 
condition, such that sọ — Init and for all i, (s;, li, 5:41) = Trans(V, X, V’) for 
some £ĉ; assignment to X. We denote with Reach(M) the set of states that are 
reachable by a valid trace in M. We adopt a hyper-dense semantics: in a trace 7, 
time is weakly monotonic, i.e. s;.time < s;41.time. We disregard Zeno behaviors, 
i.e. every finite run is a prefix of a run in which time diverges. 

The states in which time cannot elapse, i.e. which are forced to take an instan- 
taneous discrete transition, are called urgent states. We assume the existence of 
a boolean state variable urg € V which is true in all and only the urgent states. 
Namely, for every pair of states (s;,5;41) in a path m where s;.urg is true, then 
(s;.time = 5;41.time). 


1 We abuse the notation and write P = Q for P > Q when P and Q are Boolean 
variables. 
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We consider CTL+P [16], a branching-time temporal logic with the future 
and past temporal operators. A history h = (sọ, ..., Sn) for M is a finite prefix 
of a trace of M. For a CTL+P formula wv, write M, h = Y meaning that after 
h, Sn satisfies y in M. Operators AGW, E(w U p2), Hw are used with their 
standard interpretations (in every future y will always hold, there exists a future 
in which pı holds until p2, in the current history p always held, respectively). 


3 The Framework of Abstraction Modulo Stability 


3.1 Overview 


We tackle the problem of abstracting a concrete system in order to mine relevant 
high-level properties about its behavior. 

We are interested in how the system reacts to stimuli: when an action is 
performed, we want to skip the intermediate steps that are necessary to accom- 
plish an induced effect, and evaluate how stable conditions are connected to each 
other. The definition of stability is the core filter that defines which states we 
want to observe when following a run-to-completion process, i.e., the run trig- 
gered by a stimulus under the assumption that the inputs remain stationary. In 
practice, several definitions of stability are necessary, each of them corresponding 
to a different level of abstraction. 

An additional element of the desired abstraction is that relevant properties 
regard particular evaluations of the system. We consider a defined abstract space 
which intuitively holds the observable evaluations on the system, on which we 
will project the concrete states. 

In this section we describe a general framework for Abstraction Modulo Sta- 
bility, which is parametric with respect to the abstract domain and the definition 
of stability. The result will be a finite state system which simulates the original 
model, by preserving only the stable way-points on the abstract domain, and by 
skipping the transient (i.e., unstable and unobservable) states. 

Finally, we define how the obtained abstract automata can be enriched with 
guards and effects for each transition. 


Example 1. Consider as running example the timed transition system S shown 
in the right hand side of Fig. 1 which models a tank receiving a constant incoming 
flow of water, with an automatic safety valve. 

S has a clock variable c which monitors the velocity of filling and emptying 
processes, and reads an input boolean variable in.flow. The status of this variable 
is controlled by the environment €, shown in the left hand side of the figure. In the 
transition relation of E, the variables in X encode the labels for the stimuli, which 
are variations of the input variable in.flow. In particular, if X = 7, then in.flow is 
unchanged, and we say that the system S is not receiving any stimulus. S reacts 
accordingly to the updated in.flow’. The discrete transitions of S are labeled 
with guards and with resetting assignments on the clock variable (in the form 
[guards]/resets). The system starts in the Empty location. A discrete transition 
reacts to a true in.flow jumping in Filling and resetting c’ := 0. The invariant c < 
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Transs(V, in.flow’, V’) 


in.flow’ 
k = w 


Transe (in.flow, X, in.flow’) 
[in-flow’] 
fe = 


E = 7 > (in.flow’ = in.flow) 
E = open — in.flow’ 


. f; 
E = close — ~in. flow 


Fig. 1. A timed transition system representing a tank of water. 


10 of Filling forces the system to transit to a Warning location after 10 time units, 
corresponding to the time needed to reach a critical level. Warning is urgent: as 
soon as S reaches this state, it is forced to take the next discrete transition. The 
urgency of location Warning models the causality relation between the evaluation 
on the level of water and the instantaneous opening of a safety valve. Due to 
the latter, in location Full the system dumps all the incoming water and keeps 
the level of water stable. If the input is closed, S transits in Emptying. In this 
condition, water is discharged faster: after 2 time units the system is again in 
Empty. Transitions between Filling and Emptying describe the system’s reaction 
to a change of the input while in charging/discharging process. 

We consider as predicates of interest exactly the five locations of the system. 
The stability abstraction of the composed system is meant to represent the stable 
conditions reached after the triggering events defined by X. 


3.2 Abstraction Modulo Stability 


Consider a symbolic timed transition system M = (X, C, X, Init, Invar, Trans) 
whose discrete transitions are labeled by assignments to X representing stim- 
uli. A stimulus corresponds to a variation of some variables J C V which we 
call input variables. Namely, we can picture M as a closed system partitioned 
into an environment € which changes the variables J, and a open system S 
which reads the conditions of the updated variables J and reacts accordingly: 
Trans(X, X, X’) = Transe (I, X, I’) A Transs(V,1',V"’), with V = X \ I. 

In particular, we assume a distinguished assignment T to the labels X, cor- 
responding to the absence of stimuli: Transg[X7/T] = (J < I’). The transition 
labeled with 7 is the silent or internal transition. It corresponds to the discrete 
changes which keep the inputs stationary (i.e., unchanged) and the timed tran- 
sitions. We write M7’ for the restriction of M which evolves only with the silent 
transition 7, i.e., under the assumption that no external interrupting action is 
performed on S, so that I +> I’ is entailed by the transition relation. We assume 
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that M is never blocked waiting for an external action: this makes M7 always 
responsive to T transition. Moreover, we assume that Zeno behaviors are not 
introduced by this restriction. 

We define a framework for abstracting M parametric on an abstract domain 
@ and a stability definition ø. 


Abstract Domain. Between the variables of the system M, consider a set 
of boolean variables P C X representing important predicates. The abstract 
domain @ is the domain of the boolean combinations of P variables. 


Stability Definition. Let o(X) be a CTL+P formula providing a stability crite- 
rion. 


Definition 1 (c-Stability). A concrete state s with history h = (so,...,s) is 
a-stable if and only if 
M, ho. 


Note that the stability is evaluated in M7, i.e. under the assumption that the 
inputs are stationary: at the reception of an external stimulus, a o-stable might 
move to a new concrete state which does not satisfy ø. We say that a state s is 
o-stable in a region p € @ if it is o-stable and s |= p. 

The states for which M7, (s9,...,5) A o, are said o-unstable. These states 
might be transient during a convergence process which leads to the next stable 
state. In the following we will omit the prefix ø when clear from context. 


Definition 2 (Abstraction Modulo o-Stability). Given a concrete system 
M = (X,C, X, Init, Invar, Trans), with P C X boolean variables, the abstraction 
modulo o-stability of M is a finite state automaton A, = (@,2~, Init,, Trans,). 
For each po E€ P, po F Init, if and only if there exists a state so E€ S such that 
so = Init, and with ho = (so) 


M7, ho H E(-0 U (a ^ po)). 


For each pı, p2 € ®, L € 2”, the triple (pı, £, p2) = Trans, if and only if there 
exist states So, S1,S2 E S and histories hı = (so,...,S1), he = (s2) such that 
(sı, £, s2) F Trans, and such that 


M hi E oA p, M7, h H E(70 U (0 A p2)). 


Abstract automaton A, simulates with a single abstract transition a run of the 
concrete system M that connects two o-stable states with a single event and 
possibly multiple steps of internal 7 transitions. We call such convergence process 
a run-to-completion triggered by the initial event. 

Observe that the abstraction is led by the definition of o-stability. It preserves 
only the abstract regions in which there is a o-stable state. The transient states 
are not exposed, hence disregarding also the behaviors of M in which a new 
external stimuli interrupts a convergence still in progress. In other words, it 
represents the effects of stimuli accepted only in stable conditions. 

In this way, A, satisfies invariant properties that would have been violated 
in o-unstable states, transient along an internal run-to-completion. 
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Reachability-Aware Abstraction. Abstractions modulo stability can be tightened 
by considering only concrete reachable states in M. In fact, in the setting of 
reverse engineering, considering unreachable states may result in an abstraction 
that includes impossible behaviors that have no counterpart in the concrete 
space. This is done by enforcing that the first state of hy in Definition 2 to 
be reachable in M. This is an orthogonal option to the choice of the stability 
definition ø. 


3.3 Instantiating the Framework 


The level of abstraction of As, i.e., the disregarded behaviors, is directly induced 
by the chosen definition of ø. Its adequacy depends on both the application 
domain and the objective of the analysis. We now explore some possibilities that 
we consider relevant in practice. 


Predicate Abstraction. Firstly, we show that the Abstraction Modulo Stability 
framework is able to cover the known predicate abstraction [11,14]. With a trivial 
stability condition 

O71 = T, 
every concrete state s is stable and is projected in the abstract region it belongs 
to (p = A(X \ P) . s). In this way, all concrete transitions (including the timed 
ones) are reflected in the corresponding As. 


Non-urgent Abstraction. Urgent states are the ones in which time cannot elapse, 
and are forced to transit with a discrete transition. They are usually exploited 
to decompose a complex action made of multiple steps and to faithfully model 
the causality along a cyclical chain of events. Unfortunately, by construction, 
urgent states introduce transient conditions which may be physically irrelevant. 
In practice, in the analysis of the system’s behaviors, one may want to disregard 
the intermediate steps of a complex instantaneous action. 

To this aim, we apply the Abstraction Modulo Stability framework and keep 
only the states in which time can elapse for an (arbitrarily small) time bound T. 


02(X) = 7urg. 


The obtained abstract automaton A,, has transitions that correspond to 
instantaneous run-to-completion processes, skipping urgent states until time is 
allowed to elapse. 


Example 2. On the left hand side of Fig. 2 we show the abstraction of the tank 
system obtained using c1. An abstract transition connects two predicates (recall 
that in this example predicates correspond to concrete locations) if they are 
connected in S, by either a discrete or a timed transition. 

On the right hand side of Fig. 2 we show the abstraction obtained using oo. 
With respect to As, here location Warning is missing, since time cannot elapse 
in it, and an abstract transition connects directly Filling to Full. 
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close 
T [~in.f] 


Fig. 2. Abstractions modulo o; and o2 on the tank running example. 


Eq-predicate Abstractions. Let Eq(P) be a formula expressing implicitly that the 
interpretations of the abstract predicates are not changing during a transition 
(either a discrete or a timed step). 

We now address the intuitive definition: “a stable state is associated with 
behaviors that preserve the abstract predicates for enough time, i.e., if the sys- 
tem is untouched, then the predicates do not change value for a sufficient time 
interval”. One can choose to measure the permanence of s in p € ® in terms of 
number of steps (e.g., at least K concrete steps, with K € N,), or in terms of 
continuous-time (e.g., for at least T time, with T € R,), or both. 

This intuitive definition can be interpreted both backward and forward. In 
this paragraph we illustrate the backward perspective. 

Consider the doubly bounded definition 


oa’ (X) = H>T>K Eq(P), 


where: M7, h — aa if and only if h = (so...8;), with i > K and for some 


pE? 
eo i 


si-time — si- x .time > T 


Such characterization of stability captures the states that have been in the same 
predicate assignment for at least K steps and at least T time has elapsed in 
such frame. Several variants of this definition are possible, e.g. by using only one 
bound. 

This definition is referred to as backward since we consider the history of the 
system: a stable state has a past trajectory that remained in the same abstract 
region for enough time/steps. It is practically relevant in contexts where it is 
useful to highlight the dwell time of the system in a given condition. The only 
visible behaviors are the ones that were exposed for sufficient time/steps. 

It can be easily seen that if a history h satisfies o7 an then it also satisfies 
a wath Ti XT. 

Notably, for the instantiations of 03 with K = 1, a state is stable if it has 
just finished a timed transition elapsing at least T time. In the following, we 
omit the superscript K from a K when K = 1. We have that if a history h 
satisfies of, then it also satisfies o2. Namely, while every urgent state (i.e., a 
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(Aoa) 
open, {in.f] 
E -E 
TU close, |[~in.f] U, 


Fig. 3. Abstractions modulo c47" and o4 on the tank running example. 


transient state for o2) is transient also for of, for of also become transient the 
non-urgent states that are accidentally traversed in 0 time, for example because 
an exiting discrete transition is immediately enabled. 


Future Eq-predicate Abstractions. In contrast to the backward evaluation of o3, 
one can think of assessing stability forward, by looking at the future(s)? of the 
state. A possible definition in this perspective would be 


asking that, as long as only 7 transitions are taken, the system will never change 
the evaluation of predicates. Namely, once a state is o4-stable, it can change the 
predicates only with an external event, and the abstract states in As, are closed 
under 7 transitions. This is similar in spirit to the notion of P-stable abstraction 
of [5], with the difference that in the latter arbitrary regions are considered. 

Within this perspective, alternative definitions can be obtained by inter- 
changing the existential/universal path quantifiers (e.g., EG Eq(P) characterizes 
a state for which there exists a future that never changes the predicate evalu- 
ations), or by bounding the “globally” operator (e.g., AG*“ Eq(P) captures a 
state which is guaranteed to expose the same evaluations of predicates in the next 
K steps). Observe that all these variants would assess o-stability of a state before 
it has actually proven to expose the same predicates for enough time/steps. 


Example 3. On the left hand side of Fig.2 we show the abstraction obtained 
with oo) definition, using T = 7 and K = 1. State Emptying is unstable, since 
time cannot elapse in it more than T time: namely, from Full, at the reception 
of the stimulus which opens in.flow, all the 7-paths lead to Empty in less than T 
time. On the other hand, Fing is kept, since the system may stay in this location 
for enough time to be considered relevant. 

On the right hand side of Fig.2 we show the abstraction obtained with o4. 
Here, the stable states are only Empty and Full: the others are abstracted since 
they are not invariant for the 7 internal transition. Each external event directly 
leads to the end of a timed process which converges in the next stable state. Note 
that in this setting, an abstract transition labeled with 7 can only be self loops. 


? Note that, in contrast to the backward case where the past is unique, in the forward 
case we adopt a branching time view with multiple futures. 
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Here, As, corresponds to the P-stable abstraction because the chosen abstract 
domain @ is able to express the “minimally stable” regions [5] of M. 

Observe that A,, would be also obtained by increasing the time bound of 
of, e.g., with T = 15. 


As the examples show, different stability definitions induce abstract automata 
with different numbers of states and transitions. The following proposition states 
what is the effect on the abstract automata of making stricter the stability 


definition. Let us write pı =e p2 meaning that (pı, £, p2) =| Trans, in As. 


Proposition 1. Let o and o’ be two stability definitions such that every his- 
tory that is o-stable, is also o'-stable, and let Az and Ag: be the corresponding 
abstractions modulo stability of the same concrete model M. Then, A, weakly 
simulates Ası. 


Proof. By definition, if pı = p2, then there exists (s1, £, s2) | Trans with (1) 
M7, hı = o A pı, and (2) M7, he = E(70 U (oA p2), with hy = (so... , 51) and 
h2 = (s2). Since every o-stable history is also a’-stable, from (1) we obtain that 
M’,hi H o’ A pı, and from (2) we derive 
M’,h2 — EF(o A po) = > M7, he H EF(o’ A p2) 
= M’,h2 | E(70" U (o0’EX(—0"... U (0’ A pa)...))) 


Hence, pı eae p2 and Ag S Av. 
Corollary 1. For every bounds Ti < Tz E€ Ry 


Ayn > Asn = Ao, 5 Ao, 


3.4 Extending with Guards and Effects 


Abstract transitions in A, are labeled with the stimulus that has triggered the 
abstracted run-to-completion process. Recall that a stimulus £ € 2” is connected 
to a (possibly null) variation of the inputs I by Transe(I, 7, I’). A guard for an 
abstract transition (p1, £, p2) is a formula on J’ variables entailed by Transe [X /4 
which describes the configurations of inputs that, starting from pı with event 
£, lead to p2. In order to enrich the description of the effects of an abstract 
transition, we also consider a subset of state variables O C V, called output 
variables. Observe that an abstract transition may be witnessed by multiple 
concrete paths, each with its own configuration of inputs and outputs. Hence, 
we can keep track of a precise correlation between guards and effects with a 
unique relational formula on J and O variables. This formula is obtained as a 
disjunction of all the configurations of inputs and outputs in the concrete states 
accomplishing stability in pə (since the configuration of I set by the stimulus is 
preserved by 7 along the run-to-completion process). 


Example 4. The stability abstractions shown in Figs.2 and 3 are equipped with 
guard constraints, as evaluations on the original input variable in.flow, (shown 
in square brackets near the label of the stimuli). 
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4 Algorithms for Stability Abstractions 


In order to build the abstract automaton structure we have to check whether 
there exists a (reachable) o-stable state in pı, with (s1,@,s2) = Trans and 
Mt’, 82 = E(n0 U (o A p2)), for every pair (pi,p2) E€ & x &. Reachability 
analysis and (C/)LTL model checking for infinite state systems are undecidable 
problems. The work in [5] computes overapproximations of the regions that are 
invariant for silent transitions (i.e., addresses an unbounded stability criterion 
AG@), exploiting the abstract interpretation framework. This approach also over- 
approximates multiple stable targets that may be given by the non-determinism 
of the concrete system. 

Here, instead, we deal precisely with the non-determinism of the underlying 
concrete system by collecting information about actual, visible consequences of 
an action, by focusing on bounded stability definitions. In fact, we consider sta- 
bility criteria that do not require fixpoint computations in the concrete system, 
and we under-approximate the reachability analysis fixing a bound for unstable 
paths. Namely, our algorithm follows an iterative deepening approach, which 
considers progressively longer unstable run-to-completion paths, seeking for the 
next stable condition. 

Intuitively we search for concrete witnesses for an abstract transition 
(pı, £, p2) by searching for a concrete path connecting a concrete o-stable state 
sı in pı and a o-stable state in pə, with a bounded reachability analysis from 
81- 

Notice that the algorithm builds a symbolic characterization for the stability 
automaton. In fact, instead of enumerating all (pı, p2) E€ ® x and check if 
they are connected by some concrete path, we incrementally build a formula 
characterizing all the paths of M7 connecting two o-stable states. Then, we 
project such formula on the P variables, hence obtaining symbolically all the 
abstract transitions having a witness of that length. This intuition is similar 
to [15] to efficiently compute predicate abstractions. 

Moreover, having a formula representing finite paths of M7 connecting two 
o-stable states, we can extract guards and effects with a projection on J and O 
variables. Namely, while checking the existence of an abstract transition, we also 
synthesize the formula on J and O annotating it. 

A significant characteristic of our approach, also with respect to the classical 
instantiation of predicate abstraction, is that we refine the abstract transitions 
by forcing the concrete states to be reachable from the initial condition. 

In the following we describe the general algorithm for computing abstractions 
parametric on the stability definition g, and then show how the criteria proposed 
in Sect. 3.3 can be actually passed as parameter. 


4.1 Symbolic Algorithm for Bounded Stability 


Consider the symbolic encoding of automaton M = (X,C, %, Init, Invar, 
Trans),° and a classification of the variables in X distinguishing P boolean pred- 
icates variables, J input variables, O output variables. 


3 For exposition purposes, let Trans entails both Invar and Invar’. 
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We address the computation of the formulae Init,(P) and Trans,(P, J, 
O, P’), for a stability definition provided as a formula o(Xo,..., Xn) with n € N. 
The algorithm performs a reachability analysis based on two bounds: 


— U €N, as the bound for the length for unstable paths. 

—~ LEN, with L > n +1, as the bound for the length of the run witnessing an 
abstract transition, starting from the initial state, used for the reachability- 
aware refinement. 


Pseudocode 1. Reachability-aware symbolic computation of the abstract tran- 
sition relation Trans, 

1: function EXTRACT-ABSTRACT-TRANS (Init, Trans) 

2: Trans, := L; 


3 S := new-solver(); 

4 S.ASSERT (Init(Xo)); 

5: for all j € [0, L) do 

6: S.ASSERT(Trans(Xj, Xj, Xj+1)); 

T: if j < n + 1 then continue; 

8: S.PUSH(); 

9: S.ASSERT(o(Xj—n,.--,Xj7))} > stable slot at j 
10: for all i € reversed[j — 1 — U, j) do 

11: if ¿+1 < j then 

12: S.ASSERT(Ii41 = 142 ^ a0(Xi+1-n, er Xi41))3 > unstable path 
13: S.PUSH(); 

14: S.ASSERT(a(Xj—-n,.--, Xi) A Mace: In = In41); > stable slot at i 
15: S.ASSERT(—Trans,[P/P;, I/L;,O/O;, P’/P;]); 

16: Transf’? — S.PROJECT-ON(P;, I;,O;, P;); 

17: Trans, + Trans, V Trans$") [P,/P, I; /I,O;/O, P/P"; 

18: S.PoP(); 

19: S.POP(); 


return Trans, 


Computation of Trans,. Pseudocode 1 shows the algorithm for extraction of the 
transition relation Trans,. It builds a formula 


o(Xi-n, very XG) A VAN In = Inaa A 


i—n<h<i 
Init (Xo) A VAN Trans(Xp, Xn+1) A VAN (In = In4i A a0(Xh=n; pan) AN 
O<ShSj i<h<j 
a(Xj-n,..-, Xj) 


for each 7,7 with 0 < j —i < U and j < L. The procedure exploits the incre- 
mentality of the SMT solvers which organize assertions in a stack: the push/pop 
interface allows the addition of layers, in which to insert new formulae with the 
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ASSERT primitive. In this way, we can progressively build the path and avoid its 
recomputation for every pair i,j. Namely, for each 7 < L, firstly we build the 
path until j (line 6) and assert o-stability in j (line 9). Then we progressively 
try i going backward (in order to better exploit incrementality), constrain the I 
variables to be unchanged, and o-unstability (lines 11-12). 

Function S.PROJECT-ON() (line 16) performs an existential quantification of 
the formula currently present in the solver stack. We preserve variables P; and P}, 
which characterize the two stable states connected by the transition. Variables 
I; and O; are also preserved: in this way, we extract the guards and the effects 
formulae directly within the building of the abstract transition. Notice that, due 
to the input stability hypothesis preserved during the unstable path, the input 
configuration read in j is the same read immediately following the external event 
ini+l. 

Every found contribute Trans“ ) is then merged in a single Trans,, after 
substitution of the variables in P,I,O, P’. Observe that an important optimiza- 
tion is to block the negation of the already computed formula Trans, (shifted 
in the current i, j indices) before each projection (line 15), in order to avoid 
recomputing the same transitions. 


Reachability-Awareness. A reachability-unaware version would drop the first 
part of the formula characterizing the path from 0 to i — n. 

The described algorithm is reachability-aware, meaning that every considered 
stable state is, by construction, reachable from the initial condition Init. This is 
important to extract actually concretizable behaviors, and is a main difference 
with respect to the classical predicate abstraction technique: it is well known that 
mere the projection on the boolean predicates of the single transition relation 
may introduce several spurious behaviors. 

Note that the reachability-aware improvement is based on concrete reacha- 
bility. In contrast, the algorithm of [5], exploits abstract reachability until fix- 
point in the abstract automaton, possibly incurring in further overapproxima- 
tions induced by the use of convergence accelerators. 


Computation of Init,. The algorithm for the extraction of the initial state Init, 
is similar: it builds a formula 
Init(Xo) A A (Trans(Xn,Xngi) A In = Ingi) A o(Xi-ny Xi) 


O<h<i 


for every i < U. Init, is the collection of the contributes Init , obtained by 


oO? 


fixing a stable slot in the last position 7 and projecting on P; variables. 


4.2 Instantiating the Algorithm 


The bounded stability definitions presented in Sect.3 can be unrolled and 
expressed in the form o(Xo,...,Xn) 
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Predicate Abstraction. 01(Xo) = T trivially needs only the current variables. 
Observe that in this case we can use a U = 1 bound, since the unstability 
constraint is always unsatisfiable. 


Non-urgent Abstraction. Having a classification of urgent conditions, also 
02(Xo) = 7urgg can be established looking only at the current variables (it 
only needs n = 0). 


Eq-predicate Abstraction. More generally, given K and T bounds, we encode 
that the abstract region has not changed for the last K steps and that at least 
T time has elapsed using n = K and 


03°" (Xo... Xx) = N (Ph = Pasi) A (timeo +T < timex). 
h<K 


5 Experimental Evaluation 


We evaluate the applicability and the adequacy of stability abstractions for the 
reverse engineering of real-world Relay-based Railway Interlocking Systems. 


Relay-Based Railway Interlocking Systems (RRIS). RRIS are complex electro- 
mechanical circuits used for the control stations and train traffic. Such systems 
receive stimuli from an external environment, including both human operators 
(e.g., performing actions on buttons) and physical entities (e.g., a train passing 
on some sensors). In response, they control railway elements, like signaling lights 
or railway switches. Internally, they use relays to propagate signals: relays are 
electro-mechanical components which, when activated, change the position of an 
associated contact after a (possibly null) delay. 

The controlling logic implemented by RRIS is hidden by complex legacy 
internal optimizations performed over the years by numerous electro-mechanical 
engineers. For this reason, it is hard to understand their high-level behavior and 
highlight the connections between stimuli and observable railway properties. 

The experimental evaluation is based on real-world RRIS schematics that are 
intended to control level crossing and shunting routes. Using the tool NORMA [2], 
the considered RRIS have been modeled and automatically converted in timed 
transition systems in the syntax of Timed NUXmMv [7]. The obtained models 
involve several real-valued variables (modeling voltages and currents in the cir- 
cuits), changing accordingly to the configuration of the boolean variables (mod- 
eling the switches of the circuit). The discrete state changes when an external 
event updates the position of a switch, or as a consequence of the activation of an 
internal relay. Hence, these systems react to an external variation with a chain of 
internal transitions. The duration of the triggered run-to-completion process is 
important: urgent states are widely used to model the causality relation between 
the activation of an instantaneous relay and the action performed on the associ- 
ated switch; timed relays may impose a low delay, so that the internal response 
is actually very fast and almost non observable. 
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Table 1. Result of the abstraction of routesN RRIS benchmarks with different stability 
definitions. 


reach. unaware reach. aware 
test X I)\O|P| @ o Asstates | Astrans time || Asstates | As trans time 
routes01 | 54/2/1/3/ 8 | a 8 40 Ols 7 13 26s 
02 4 6 20s 3 4 3m 09s 
03,T=1 3 4 15s 2 2 1m 57s 
03,T=7 | 3 4 15s | 2 2 1m 57s 
routeso2, 90/4/2/6 48 | o | 48 768 22s|) 11 22 1m 02s 
o2 7 13 46s 6 11 6m 16s 
03,T=1 5 9 38s 4 7 4m 32s 
03,T=7 | 5 9 38s | 4 7 4m 20s 
routeso4|166/8]3]12/4096/| oao | - = To 49 97 3h 7m 03s 
o2 29 83 | Lh 42m 29s 25 48 | 2h 56m 46s 
o3,r-1 || 17 52 |lh4imi7s|) 13 24 2h 42m 04s 
o3,r=7 || 17 52 |lh4imi0s|) 13 24 | 2h 41m 55s 
Abstraction Modulo Stability of RRIS. The Timed NUXMV model checker was 


used to convert the models produced by NORMA in untimed transition systems 
in SMV. The algorithm presented in Sect.4 has been implemented using the 
PYSMT library [10] and the MATHSATS5 SMT solver [8]. It requires in input 
a Classification of the variables X, selecting the predicates P, the inputs J and 
the outputs O, which can be directly provided by railway domain experts. We 
choose as P the status of some relays or (boolean variables associated with) 
linear predicates on the electrical variables, representing, as an example, the 
status of a lamp. 

Table1 and 2 report the number of variables X, P, I, O for each bench- 
mark. Column @ reports the size of the resulting abstract domain, obtained by 
considering all the consistent combinations of P predicates (with respect to the 
invariant of the model). 

We show the results of the Abstraction Modulo Stability considering the 
stability definitions described in Sect. 3.3, using the algorithm of Sect.4 with 
bounds L = 40 and U = 15. All the experiment ran on a 2.4GHz CPU, with 
time out (TO) set to 15 h, and memory limit set to 20 GB. 

Columns “A,states” and “A,trans” hold the number of abstract states and 
transitions respectively, computed counting the configurations of the predicate 
variables in the abstract automaton A,. As stated in Corollary 1, the corre- 
sponding abstract automata have progressively less states. 

Stability abstractions were used by railway experts from the Italian Railway 
Network company (RFT) to understand two main families of legacy RRIS. 


Routes. routesN is a RRIS regulating the activation/deactivation of N shunt- 
ing routes concurring for the same resources. The implemented logic takes care 
of avoiding the simultaneous activation of conflicting routes. In such RRIS 
the inputs are the switches controlled by a human operator, attempting to 
enable/disable a route; the outputs are the status of some internal entities that 
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we want to monitor; the predicates are the status of lamps representing whether 
the routes have been registered. 

In the routes benchmarks the delays used in the run-to-completion processes 
are very small, so that in the abstract automata obtained (Table 1) there is no 
difference between of =! and of'~" (i.e., if a state has stayed in the same predicate 
for 1 time unit, then it can also stay there for 7). These abstract automata clearly 
highlight what are the consequences of the requests of a human operator with 
respect to the active/inactive status of the routes involved. As an example, the 
abstraction routes02 (a circuit handling two routes) has only 4 stable states 
which show that the routes are incompatible and one of them has priority on 
the other, and disregards all the intermediate steps that the concrete system 
needs to progressively check the availability of the resources. These steps are 
visible with a less strict stability definition, like c1 or o2. 

Table 1 also evaluates the effectiveness of the reachability refinement. When 
dropping the prefix starting from the initial states of the concrete system, the 
algorithm would consider several spurious behaviors. Especially in these bench- 
marks, the resulting abstract automaton would also show the unreachable states 
(e.g., the ones in which two routes are in conflict), therefore reducing the rele- 
vance for the reverse engineering purpose. Moreover, the reach.unaware compu- 
tation may be harder to compute as it has to explore more transitions and more 
models in the guards and effects formulae. 


Railway Switch. r-switch is a RRIS modeling a railway switch. It has sev- 
eral externally controlled switches and only 4 relevant observations, defining its 
abstract state. The schema can be instantiated as nominal (N) or faulty (F), by 
injecting faulty behaviors in some physical components. We consider three ver- 
sions: r-switch1 interacts with a free environment, showing a wide number of 
circuit configurations; r-switch2 and r-switch3, instead, exploit some assump- 
tions on the environment and expose less inputs, and, although using different 
internal implementations, are supposed to guarantee the same controlling logic. 

Table2 reports the features of the abstract automata obtained for these 
benchmarks. Here, during a run-to-completion process, some states dwell in the 
same predicate for a time 1 < t < 7, so that are visible in of =! but skipped by 
o4~" when reporting the corresponding abstract transition. 

Again, the reach.unaware option reports more transitions. The difference is 
especially evident in the nominal versions, as the faulty concrete system already 
covers more behaviors. Even when the number of abstract transitions is the 
same, the reach.aware option reports more precise guards and effects, i.e., each 
annotating formula on J and O has less models. 

By looking at the abstract automata, the user could recover what are the 
triggering reasons that make the system reach certain states (e.g., the ones that 
are shown in r-switch1 and not in r-switch2). Namely, A, could highlight the 
enabling conditions for certain behaviors, which may apply far from the final 
observable consequence and were hard to inspect by hand. In this way, the user 
could also collect what assumptions are needed to avoid certain behaviors (e.g., 
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Table 2. Result of the abstraction of r-switch RRIS benchmarks with different sta- 
bility definitions. 


reach. unaware reach. aware 
test X |I\|O|\|P|@ o Agstates | Ag trans time || Agstates | Ag trans time 
r-switchl-N | 128 | 18 | 3 | 4 | 12 o1 = = TO 12 78 8h 12m 
o2 2 12 4h 12m 12 94 2h 42m 
°3,T=1 2 12 7h 47m 12 86 2h 30m 
°3,T=7 2 12 7h 24m 12 66 2h 07m 
r-switchl-F | 128 | 18| 3 | 4 |12 o1 = = TO = = TO 
o2 = = TO 12 112 7h 60m 
03,T=1 2 12 13h 12m 12 112 5h 24m 
°3,T=7 2 12 14h 05m 12 112 4h 45m 
r-switch2-N | 127/17] 3 | 4 12 ai 2 02 8h 18m 12 74 3h 29m 
o2 0 86 1h 56m 10 74 1h 18m 
03, T=1 0 86 2h 12m 10 66 1h 10m 
03,T=7 (0) 86 2h 31m 10 54 58m 
r-switch2-F | 127 | 17| 3 | 4 | 12 oi 5 5 TO 12 90 10h 34m 
o2 0 86 4h 21m 10 86 2h 42m 
03, T=1 (0 86 4h 30m 10 86 2h 12m 
03, T=7 (0) 86 4h 33m 10 86 1h 39m 
r-switch3-N | 121 | 16 | 3 | 4 | 12 i 2 102 3h 28m 12 74 2h 08m 
o2 0 86 52m 10 74 52m 
03,T=1 0 86 1h 34m 10 66 51m 
03, T=7 (0) 86 1h 32m 10 54 44m 
r-switch3-F | 121 | 16 | 3 | 4 | 12 o1 = = TO 12 90 4h 21m 
o2 (0) 86 2h 46m 10 86 1h 38m 
03,T=1 0 86 2h Olm 10 86 1h 22m 
03, T=7 0 86 2h 16m 10 86 1h 24m 


in understanding what changes were made from r-switch1 to r-switch2 or 
r-switch3 schemas). 

Finally, as expected, r-switch2 and r-switch3 have exactly the same 
abstract automata for every stability definition and nominal/faulty configura- 
tion, since they are two different implementations for the same observable prop- 
erties. 


P-Stable Abstractions. We also tried the implementation of [5], for approximated 
P-stable abstractions (o4), which uses BDDs and convex polyhedra. On small 
handcrafted models like the tank system used as running example we could 
run all the approaches and confirm the output automata described in Sect. 3. 
Nonetheless, in the analysis of RRIS the approach of [5] turned out to be imprac- 
tical, and was unable to deal with any of the considered RRIS models, due to 
the high number of variables. 

More importantly, in our case studies, 04 would likely result in abstractions 
that are too aggressive, hiding states that are practically interesting, such as 
the ones that emerge from the analysis of run-to-completion processes with non 
negligible duration. 
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6 Conclusions 


In this paper we presented a framework for the reverse engineering of legacy 
systems. Starting from a symbolic timed transition system, the framework sup- 
ports the construction of abstractions in the form of state machines with guards 
and effects over transitions. The abstractions are parameterized on the notion 
of stability. We propose an SMT-based algorithm for abstraction computation, 
and we instantiate it over several notions of stability. 

The results have been evaluated within an industrial project with the Italian 
Railway Network, on reverse-engineering tasks of complex relay-based interlock- 
ing circuits. The experimental analysis demonstrated that the approach is prac- 
tical, and able to construct abstractions for complex real-world circuits. Taking 
reachability into account allowed us to produce tighter, more informative repre- 
sentations of the system under inspection. Railway signaling engineers involved 
in the project considered the proposed approach adequate in terms of expres- 
siveness and able to provide substantial support in understanding the legacy 
RRIS. 

In the future, we will define an “anytime” version of algorithms, so that the 
abstraction can be incrementally visualized as the computation proceeds, and 
leverage parallelization to increase the efficiency. Given the positive feedback 
from the RFI experts, we plan to integrate the proposed abstraction techniques 
abstraction within a RRIS modeling front-end, and to apply them on a larger 
set of interlockings. 
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Abstract. Koopman operator linearization approximates nonlinear sys- 
tems of differential equations with higher-dimensional linear systems. 
For formal verification using reachability analysis, this is an attractive 
conversion, as highly scalable methods exist to compute reachable sets 
for linear systems. However, two main challenges are present with this 
approach, both of which are addressed in this work. First, the approx- 
imation must be sufficiently accurate for the result to be meaningful, 
which is controlled by the choice of observable functions during Koopman 
operator linearization. By using random Fourier features as observable 
functions, the process becomes more systematic than earlier work, while 
providing a higher-accuracy approximation. Second, although the higher- 
dimensional system is linear, simple convex initial sets in the original 
space can become complex non-convex initial sets in the linear system. 
We overcome this using a combination of Taylor model arithmetic and 
polynomial zonotope refinement. Compared with prior work, the result 
is more efficient, more systematic and more accurate. 


Keywords: Koopman operator - Reachability analysis - Polynomial 
zonotopes - Random Fourier features - Formal verification 


1 Introduction 


Despite recent advances, systems described by nonlinear ordinary differential 
equations are still hard to analyze, control, and verify. On the other hand, a 
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powerful body of methods and theories exists for linear systems making analy- 
sis, control, and verification much easier, even for high-dimensional systems. The 
efficiency of techniques related to reachability analysis for linear systems [4,6, 15] 
motivates the use of Koopman operator linearization, where a higher-dimensional 
linear system approximates the dynamic behavior of a nonlinear system. Koop- 
man operator techniques are also well-suited for data-driven approaches since 
the Koopman linearized system can be directly created from measurements, 
bypassing a potentially complex modeling step. The Koopman framework has 
been successfully applied to many applications, including control [26,28], state 
estimation [31] and recently, formal verification [5]. 

The main contribution of this paper is to advance the state-of-the-art in 
formal verification using reachability analysis on Koopman operator linearized 
systems. First, we improve the accuracy of the finite Koopman linearization 
by employing random Fourier features [29]. In contrast with an ad hoc, finite- 
dimensional feature space, random Fourier features leverage the powerful ker- 
nel trick from machine learning [36,38] to generate a computationally tractable 
mapping over an infinite-dimensional feature space. Second, we improve speed. 
Instead of using an SMT solver to reason over non-convex initial sets, we propose 
combining Taylor models with polynomial zonotope refinement. A comparison 
on the same nonlinear system benchmarks used in the earlier Koopman veri- 
fication work [5] demonstrates both the improved accuracy and the improved 
verification speed. 


1.1 Related Work 


The concept of Koopman operator linearization was originally introduced in 1931 
[22]. Instead of investigating the dynamic evolution of the original system state, 
the Koopman approach considers the evolution of so-called observable functions 
or observables defined by nonlinear transformations of the original system state. 
Since the set of all possible observables defines a vector space, it then holds that 
the dynamic behavior of every nonlinear system can be equivalently represented 
by an infinite dimensional linear system. Because it is obviously infeasible to 
handle infinite dimensions, a finite set of observables is used in practice. Given 
such a set, the system matrix resulting in the most accurate linear approximation 
of the original system behavior can be determined using extended dynamic mode 
decomposition [41]. 

Many different methods for determining good observables have been pro- 
posed: Carleman linearization [7] equivalently represents the dynamic behavior of 
polynomial systems with an infinite dimensional linear system. The correspond- 
ing observables are multi-variate monomials, which are determined by repeatedly 
computing the time-derivative of the current observables. Terminating this iter- 
ation after a certain number of steps yields a finite set of observables. Carleman 
linearization can be extended to general nonlinear systems by using a Taylor 
series expansion. A finite set of observables defines an exact linear representa- 
tion of the original system if the vector space spanned by the observables is closed 
under the operation of Lie-derivatives [34]. Consequently, a natural approach is 
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to refine an initial set of observables by removing observables that violate the 
condition [34]. This concept can be extended to obtain polynomial instead of 
linear representations for the original nonlinear system [35]. Another class of 
approaches uses neural networks as observables [16,43], where the weights of the 
network are trained on traces of the real system. Since these approaches usually 
train the system matrix together with neural networks, they circumvent the sub- 
sequent application of dynamic mode decomposition. If one aims to reason about 
the original system based on the Koopman linearization, some quantification of 
the approximation error is required. Several approaches derive error bounds for 
truncated Carleman linearization [3,12,24] considering quadratic systems [24], 
polynomial systems [12], as well as general nonlinear systems [3]. 

The main motivation for using the Koopman framework for reachability 
analysis is that reachable sets for linear systems can be computed efficiently 
[11,15,23] even for high-dimensional systems [2,4,6], while reachability analysis 
for nonlinear systems [1,8,27] is often computationally demanding and poten- 
tially results in large over-approximations. Another advantage is that the Koop- 
man approach can also be applied to data-driven systems where no model is 
available. Due to the nonlinear transformation of the initial state defined by 
the observables, reachability analysis for Koopman operator linearized system 
represents a special type of reachability problem. To the best of our knowl- 
edge only two approaches exist for far: The first approach [13] utilizes the error 
bounds for quadratic systems [24] to compute an enclosure of the reachable set 
for weakly nonlinear systems based on a finite Carleman linearization, where 
interval arithmetic [17] is applied to enclose the image of the initial set through 
the observables. The second approach [5], which represents the work closest to 
our method, presents two different verification strategies: 1) Direct encoding of 
the nonlinear transformation defined by the observables using a SMT solver, 
and 2) zonotope domain splitting, where the initial set is recursively split into 
smaller sets until the specification can be verified or falsified. 


1.2 Overview 


In this work we address the two main bottlenecks of formal verification for Koop- 
man operator linearized systems, which are the selection of observables and the 
computation of the image of the initial set through the nonlinear transforma- 
tion defined by the observables. In particular, while currently observables often 
have to be selected manually by the user, we generate observables in a systematic 
fashion using random Fourier features. As we demonstrate with numerical exper- 
iments, these observables yield high-accuracy approximations of the real system 
behavior. Moreover, while previous approaches either compute very conservative 
convex enclosures of the image through the observables [13] or have to split the 
initial set in order to achieve a desired precision [5], we calculate tight non-convex 
enclosures of the image by combining Taylor model arithmetic with polynomial 
zonotopes. To conduct collision checks between the resulting non-convex reach- 
able set enclosures and unsafe regions we then use a novel polynomial zonotope 
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refinement strategy, which is significantly faster than the previous SMT solver 
and zonotope domain splitting approaches [5]. 

The remainder of the paper is structured as follows: We first recapitulate 
some preliminary results that are required throughout the paper in Sect. 2. In 
the main part we then describe the systematic generation of observables using 
random Fourier features in Sect. 3, before we present our proposed verification 
algorithm in Sect. 4. Finally, we demonstrate the superior performance of random 
Fourier feature observables and our verification algorithm in comparison with 
existing techniques on various benchmark systems in Sect. 5. 


1.3 Notation 


In the remainder of this paper, we will use the following notations: Sets are 
denoted by calligraphic letters, matrices by uppercase letters, vectors by lower- 
case letters, and lists by bold uppercase letters. Given a vector b € R”, bq) refers 
to the i-th entry. Given a matrix A € R”*™, Aq) represents the i-th matrix row, 
A(..7) the j-th column, and Aq; j) the j-th entry of matrix row i. Given a discrete 
set of positive integer indices H = {hi,..., hw} with 1 < h; < m Vie {1,..., wh, 
A(n) is used for [A(..n,) --- A(.,n,,)], where [C D] denotes the concatenation 
of two matrices C and D. The symbols 0 and 1 represent matrices of zeros and 
ones of proper dimension, the empty matrix is denoted by [ ], and J, € R”*” is 
the identity matrix. Given an ordered list L = (11,...,Jn), Ly = l; refers to the 
i-th entry and |L| = n denotes the number of elements in the list. Moreover, the 
concatenation of two lists Lı and Lg is denoted by (L1, L2). The left multiplica- 
tion of a matrix M e R™*” with a set S c R” is defined as MS = {Ms|s eS}, 
and the Cartesian product of two sets is denoted by the x operator. We further 
introduce an n-dimensional interval as Z = [l, u], Vi la) < ua, we R”. 


2 Preliminaries 


Our approach utilizes several existing techniques and concepts, which we shortly 
recapitulate here. We use the nonlinear system 


ry = 

l 4 (1) 
T2 = T2 — Tı 

in combination with the initial set % = [—2,2] x [0,4] as a running example 

throughout this section. 


2.1 Koopman Operator Linearization 


First, we describe the general concept of Koopman operator linearization [22]. 
Given a nonlinear system 


Ox 


Sr T f(e) with weR", f: R” OR’, (2) 
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our goal is to find observables g; : R” — R such that the dynamics of the 
resulting new variables g;(x) is linear: 


0 
ate) = Ag(x) with Ae R™*”, (3) 
where g(x) = [gi(z) ... gm(x)]T is the observable function. Since the new vari- 


ables g;(a) are functions of the original system state x, the linear system (3) 
defines an equivalent representation of the dynamic behavior of the original sys- 
tem (2). Usually, the number of observables m is significantly larger than the 
dimension n of the original system. 

Let us demonstrate Koopman linearization for our exemplary system in (1). 
By choosing the observables gi (x) = 21, g2(x) = v2, and g3(x) = xt we obtain 
the linear system 


a | gi(2) 1 0 0} |gi(z) 
az |2| =]|0 1 -1| | gala) 
ga(x) 0 0 4| |ga(x) 
since gı (x)/ðt = tı = z1, Oge(x)/Ot = tə = 22 — 2} = go(x) — g3(x), and 


ðgz(x)/ôt = 423 4) = 4a} = 4g3(z). 

The exact linearization using a finite number of observables demonstrated by 
the example above is unfortunately only possible for a small number of special 
systems. In practice one therefore usually aims to instead determine a linear 
system (3) that approximates the dynamic behavior of the nonlinear system 
(2) well enough. Given observables g;(x), the system matrix A resulting in the 
best approximation can be determined by applying extended dynamic mode 
decomposition [41] to traces of the original system. Since those traces can also 
be generated by simulating black-box systems or by measuring the real system 
behavior, we do not necessarily require a model (2) of the original system. This 
is one of the biggest advantages of the Koopman framework making it well 
suited for data-driven approaches. The approach we present in this work verifies 
Koopman linearized systems using reachability analysis: 


Definition 1. (Reachable set) Given an initial set Xo c R”, the reachable set 
for a Koopman linearized system is 


R(t) := {E(t, g(£0)) | £o € Xo}, 
where &(t, g(xo)) is the solution to (3) at time t € Rso for the initial state g(xo). 


Consequently, to compute the reachable set for a Koopman linearized system 
one first needs to propagate the initial set through the nonlinear transformation 
defined by the observables, followed by the calculation of the reachable set for the 
linear system in (3) using a reachability algorithm. This procedure is visualized in 
Fig. 1. Definition 1 defines the reachable set for the observables g;(a). However, 
since safety specifications are typically defined on the original system state x 
rather than on g(x), we usually require the reachable set for the original state 
R(t) for verification. This issue can easily be resolved by using the original 
system state x for the first n observables g;() = vq, i = 1,...,n, in which case 
R(t) can be obtained via projection: R,(t) = Zn 0] R(t). 
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Fig. 1. Schematic visualization of reachability analysis for Koopman linearized systems: 
We first transform the initial set to the higher-dimensional observable space using g(x), 
then compute the reachable set of the linear system using the matrix exponential e^^t 
with time-step size At, and finally obtain the reachable set in the original state space 
via projection. 


2.2 Taylor Model Arithmetic 


Taylor model arithmetic [25] can be utilized to compute tight non-convex enclo- 
sures for the image through a nonlinear function. It is based on a set represen- 
tation called Taylor models: 


Definition 2. (Taylor model) Given a polynomial function p: R* —> R”, an 
interval domain D c R5, and an interval remainder Y c R”, a Taylor model 


T(x) is defined as 
VaeD: T(z ):= {p(x )ty|yeV}. 


The Taylor order k € N defines an upper bound for the polynomial degree of the 
polynomial p(x). The set defined by a Taylor model is 


{T(x) | ceD} = {p(z) +y | ceD, ye Y}. 
For a concise notation we use the shorthand T(x) = (p(x), Y, DYr. 


The general concept of Taylor model arithmetic is to define rules on how to 
perform the arithmetic operations +, —, -, and / as well as elementary functions 
such as sin(x) or yx on Taylor models [25, Sec. 2]. Since every nonlinear function 
represents a composition of arithmetic operations and elementary functions, the 
image through the function can then be computed by successively evaluating 
those rules. Given two one-dimensional Taylor models T(x) = (pi(x),1,DYr 
and D(x) = <p2(x),2,D)r the rules for addition and multiplication are for 
example given as 


T, (2) Pr := (pı (x) cree ), Vi + V2, Dy 
Ti(x) := (pı (x) ), Vi 2+ Tı - V2 + Vı “Ta, Ds 
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where Tı = {pi(x) | x € D} and T2 = {po(x) | x € D}. The rules for elementary 
functions are obtained using a finite Taylor series expansion, where the order of 
the Taylor series is equal to the Taylor order «. For sin(x) we for example obtain 
with « = 2 the rule 


sin (7 (x)) := (sin(c) + cos(c) (pi(x) — c) — 0.5 sin(c) (pi (£) — c)’, Des 


where the expansion point c is chosen as c = pi(ca) with cq being the center of 
the domain D, and the interval Y computed according to [25, Sec. 2] encloses 
the remainder of the Taylor series. Due to the finite Taylor series approximation, 
Taylor model arithmetic yields a tight enclosure rather than the exact image. 
The accuracy of the enclosure can be improved by choosing a larger Taylor order. 

For our verification approach we apply Taylor model arithmetic to compute 
the image of the initial set through the observable function. The initial set Xo = 
{[—2, 2] x [0,4] for the exemplary system in (1) can be represented by the Taylor 
model T(x) = (2, Ø, Xo)r. Applying Taylor model arithmetic to the observable 
function g(x) defined by the observables gı (£) = 21, go(x) = £2, and g3(x) = at 
then yields the Taylor model 


{ g(x) | xe Xo} c( a Ø, [—2, 2] x oa) , (4) 


Ti T 


which represents the exact image in this case since the observables contain poly- 
nomial functions only. 


2.3 Set Representations 


In this work we use polynomial zonotopes to represent reachable sets, polytopes 
to represent unsafe sets, and zonotopes for efficient collision checking. Let us 
first introduce polytopes, for which we consider the halfspace representation: 


Definition 3. (Polytope) Given a matrix H € R*®*” and vector d € R5, the 
halfspace representation of a polytope P c R” is defined as 


={xeR” | Ha<d}. 
We use the shorthand P = <H, dòp. 


A halfspace H c R” is a special case of a polytope consisting of a single inequality 
constraint h? x < d with h € R”, de R. We use the shorthand H = lh, dyp. 
Another special type of polytopes are zonotopes, which can be stored efficiently 
using so-called generators: 


Definition 4. (Zonotope) Given a center vector ce R” and a generator matriz 
GeR"*?, a zonotope Z c R” is defined as 
a, E€ [—1, i}, 


p 
Z215 fc + >» Qi Gii) 
where the scalars a; are called factors. We use the shorthand Z = <c, Gz. 


i=1 
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Polynomial zonotopes are a novel non-convex set representation that has been 
originally introduced for reachability analysis of nonlinear systems [1]. We use 
the sparse representation of polynomial zonotopes [20]: 


Definition 5. (Polynomial zonotope) Given a constant offset ce R”, a genera- 
tor matrix of dependent generators G € R"*", a generator matrix of independent 
generators Gr € R"*4, and an exponent matriz E € Nee a polynomial zonotope 
PZ c R” is defined as 


PZ = {+0 (Tet es EON j) 
i=1 = 


The scalars œp are called dependent factors since a change in their value affects 
multiplication with multiple generators. Consequently, the scalars Bj are called 
independent factors because they only affect multiplication with one generator. 
We use the shorthand PZ = <c, G, Gr, EY pz. 


Qk, bj = [-1, i}. 


Using polynomial zonotopes for verification has two main advantages: 


1. Due to the similarity with Taylor models the set defined by a Taylor model 
can be equivalently represented as a polynomial zonotope [20, Prop. 4]. 

2. Due to the similarity with zonotopes tight enclosing zonotopes can be com- 
puted efficiently for polynomial zonotopes [20, Prop. 5]. 


For verification we therefore convert the Taylor model representing the image 
of the initial set through the observable function to a polynomial zonotope, for 
which collision checks with the unsafe sets can be efficiently realized using zono- 
tope enclosures that are iteratively refined by splitting the polynomial zonotope. 

The conversion of the Taylor model in (4) corresponding to our running 
example in (1) yields the following polynomial zonotope 


“i 0] 2 00 
(acorns) GIB Bog ta) 
zi 0 016 


0 2 
= 2| + |0| ay + |2 oo [o at | 01,02 € [-1,1] >, 
0 0 


PZ 


where the high-level idea of the conversion is to represent the interval domain 
D with dependent zonotope factors a; € [—1, 1]. 


3 Linearization via Fourier Features 


We now present the automated generation of observables using random Fourier 
features [10]. Let us first motivate why Fourier features are a good choice for 


1 In contrast to [20, Def. 1], we explicitly do not integrate the constant offset c in G. 
Moreover, we omit the identifier vector used in the original work [20] for simplicity. 
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observables. For Koopman linearization, the observables g(x) define a transfor- 
mation to a high-dimensional space. One commonly used approach to handle 
such high-dimensional spaces efficiently is the kernel trick: In many algorithms 
the data points x,y € R” only appear in the form of inner products g(x)" g(y). 
In this case it suffices to define a kernel function k(x, y) that represents the sim- 
ilarity measure g(x)" g(y) between data points in the high-dimensional feature 
space, rather than explicitly defining a transformation g(x) to this space. Kernel 
functions can also represent more general features that are not vectors and even 
infinite dimensional features, which motivates their application in the Koopman 
framework. The kernel trick is mainly applied for machine learning techniques 
[36], such as regression [38], clustering [18], and classification [39]. However, also 
the extended dynamic mode decomposition algorithm [41] can be formulated in 
terms of inner-products [42], so that the kernel trick can be applied for Koopman 
linearization. Rather than explicitly choosing observables g(x) we can therefore 
select a kernel function instead, which implicitly defines the observable function 
g(x) through the kernel’s relation to an inner product space. Commonly used 
kernels are radial basis function kernels, polynomial kernels, and spline kernels. 

The kernel trick cannot be applied directly to our reachability technique since 
we require an explicit formulation of the observables g(x). We therefore first 
select a kernel function k(x, y), and then determine observables g(x) that yield a 
good approximation of the kernel function k(x, y) ~ g(x)" g(y). Random Fourier 
features are a common technique to approximate kernel functions [10,29]. They 
are based on Bochner’s theorem [33, Sec. 1.4.3], which links a weakly stationary 
kernel function to a Fourier transform: 


kay) = | e77 dulu) = Bu (e272 e979), 6) 


where the function u : R” — [0,1] defines a probability distribution, Ew (-) 
denotes the expected value with respect to w, j is the imaginary unit, and @ 
denotes the complex conjugate for a complex number a e C. The distribution 
p(w) associated with a specific kernel can be obtained by taking the inverse 
Fourier transform of k(x, y) [29]. We can collect m samples from the distribution 
(w) to approximate the expected value in (5), which finally yields 


m 


P EE E 1 i i 
k(z,y) =E (e1972 eFeTv) w — erai? ed Od 
(x,y) = Ev = > 
i=l gi(z) gi ly) 


The random Fourier features are the resulting observables g;(x) that approxi- 
mate the kernel function. Note that we can omit the constant factor Ł since 
extended dynamic mode decomposition will automatically scale the observables 
accordingly. We consider real-valued kernels only, so we use Euler’s formula 


ef? = cos(x) + j sin(x) to simplify the random Fourier features to 
gil£) = V2 cos(w2 x + bi), i=1,...,m, (6) 


where the shift b; is selected uniformly from the interval [0,27] and w; is drawn 
randomly from the probability distribution u(w) corresponding to the kernel 
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that is used. While this random selection might appear to be a disadvantage 
at first sight, it is guaranteed that the random Fourier feature approximation 
converges to the exact kernel function when increasing the number of observables 
[29]. Moreover, we observed from our numerical experiments that changes in the 
values for b; and w; do not significantly influence the accuracy of the resulting 
linear approximation. 

In summary, the random Fourier features presented above represent a sys- 
tematic method for selecting a finite set of accurate observables, which requires 
only few hyperparameters. These hyperparameters include the type of kernel 
that is used, the kernel parameters, and the number of observables. For the 
numerical experiments in this paper we use a radial basis function kernel 

_ lz-ylĝ 
k(z,y) =e 22, 
which contains the lengthscale £ as the only parameter. The probability distribu- 
tion u(w) for this kernel is the multivariate normal distribution with covariance 
matrix @? - In centered at the origin [29, Fig. 1]. 


4 Verification Using Reachability Analysis 


We now present our novel verification algorithm for Koopman linearized systems, 
which is summarized in Algorithm 1. For simplicity we assume that the specifi- 
cation we aim to verify is described by a single unsafe set U, but the extension to 
multiple unsafe sets is straightforward. We first apply Taylor model arithmetic 
(see Sect. 2.2) to compute a tight non-convex enclosure for the image of the 
initial set Xo through the observable function g(a) in Line 3. Since it simplifies 
the computation of the zonotope enclosures required later on, we then convert 
the resulting Taylor model to a polynomial zonotope in Line 4. This polynomial 
zonotope is used as the initial set for the computation of the reachable set for 
the Koopman linearized system as performed in Line 5, for which we can use any 
reachability algorithm for linear systems. For simplicity we assume here that the 
obtained reachable sets are exact. In the general case where the exact reachable 
set cannot be computed one can for example incorporate the error measures 
from [14] and [40] into the verification algorithm. 

The problem we are facing now is that the reachable sets Ro,...,Rip/at 
are represented by polynomial zonotopes, a set representation for which exact 
collision checks with the unsafe set U are computationally demanding. We resolve 
this issue by applying a novel polynomial zonotope refinement procedure in lines 
6-19, where we recursively split the polynomial zonotopes until we can either 
verify or falsify the specification using zonotope enclosures of the split sets. In 
particular, we first enclose each polynomial zonotope in the queue L with a 
zonotope in Line 9. For a zonotope Z = <c,G)z collision checks with an unsafe 
set as performed in Line 10 are very efficient: If the unsafe set is a halfspace 
U = lh, dz, we have according to [15, Sec. 5.1] 


(Zanu #ž DBD) eS Gz 5 IATGe n) < ‘) (7) 


i=1 
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Algorithm 1. Verification of Koopman linearized systems 


Require: Koopman linearized system g(x) = A g(x), initial set Xo, final time tr, 
specification given as an unsafe set U, time step size At, initial Taylor order Ko. 
Ensure: System is safe (res = T) or unsafe (res = L). 
1: res —L, K <— Ko (initialization) 
2: repeat 
3 T(x) — {g(x) | xe Xo} (comp. using Taylor model arithmetic with order x) 
4: PZ + T(x) (convert Taylor model to polynomial zonotope, see [20, Prop. 4]) 
5: Ro,--+,Rtp/at — reachability analysis of ġ(x) = A g(x) for initial set PZ 
6: 
7 
8 


L e (Ro,-..,Rtp/at) (initialize queue of not yet verified sets) 
repeat 
PZ = Laj, L- (Ly),..., Laup) (pop first element from queue) 

9: Z  zonotope enclosure of PZ (see [20, Prop. 5]) 
10: if Z aU # Ø then (check if specification is satisfied, see (7) and (8)) 
11: zo, t — most critical initial state and corresponding time 
12: if [In 0] e“’g(xo) € U then 
13: return (specification falsified = system is unsafe) 
14: else 
15: PZ1,PZ2 — split PZ (see Prop. 1 and (11)) 
16: L<«(L,P2Z1,PZ2) (add new sets to queue) 
17: end if 
18: end if 
19: until L = ( ) or splitting does not yield any further improvement 
20: k—rk+1 (increase Taylor order) 
21: until L = ( ) (queue empty = no intersection with U) 
22: res — T (if this line is reached no reach. set intersects U = system is safe) 


For general polytopes U = <H, dp collision checks can be realized using linear 
programming: 


(Z nU #4 Ø) (5 =0), (8) 


where 
ô = min ||c + Ga — z||ı s.t. ae [-1,1], Hz < d. (9) 


If the specification cannot be verified, we next try to falsfy it in lines 11-13 
by extracting the initial point zo that is expected to violate the specification 
the most from Z. For a halfspace U = ¢h,d)x the vector of zonotope factors 
a = [ay ... ap]? resulting in the largest violation is given as a = —sign(h7G), 
where the signum function is interpreted elementwise. Since the factors a of the 
zonotope enclosure are related to the dependent factors of the original poly- 
nomial zonotope and since polynomial zonotopes preserve dependencies during 
reachability analysis [21], we can then directly extract the initial point xo cor- 
responding to a from the polynomial zonotope. For general polytopes we can 
use the optimal a from the linear program in (9) to estimate the most critical 
initial point. If we can neither verify nor falsify the specification we have a so 
called spurious counterexample that arises due to the over-approximation intro- 
duced by the zonotope enclosure. We therefore split the polynomial zonotope in 
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Fig. 2. Reachable set for the Roessler system (see Sect. 5.1) at time t = 2.95, where 
polynomial zonotopes are depicted by solid lines, the corresponding zonotope enclosures 
are depicted by dashed lines, and the unsafe set is shown in orange. While the zonotope 
enclosure of the original polynomial zonotope is too conservative to verify the speci- 
fication (left), splitting the polynomial zonotope once reduces the over-approximation 
enough for verification to succeed (right). 


this case in Line 15 since splitting reduces the over-approximation in the zono- 
tope enclosure (see Fig. 2). The split sets are then added to the queue in Line 16, 
where we use a first-in, first-out scheme for the queue to detect easy falsifications 
fast before excessively splitting the sets. 

One remaining issue we are facing is that Taylor model arithmetic is not 
exact. Due to the over-approximation in the initial set it can therefore happen 
that we can neither verify nor falsify the specification by splitting the polynomial 
zonotope. To solve this issue we embed our whole algorithm into a repeat-until- 
loop that iteratively increases the order « used for Taylor model arithmetic (see 
Line 20). Since Taylor model arithmetic converges to the exact result if the order 
goes to infinity, we obtain a complete algorithm that is guaranteed to terminate. 
In practice we can often prevent computational expensive iterations of the outer 
loop by choosing the initial order Ko large enough. It remains to decide when to 
stop splitting the polynomial zonotopes and increase the Taylor order instead 
(see Line 19). The simplest method is to just use an upper bound for the number 
of recursive splits that are performed. A more sophisticated approach is to abort 
splitting if the distance between the most critical point [Ip 0] e“‘g(xo) and the 
unsafe set U is smaller than the over-approximation in the polynomial zonotope 
PZ, which is given by the independent generators. 

Finally, we provide a closed-form expression for splitting a polynomial zono- 
tope since this operation is not specified in the original work [20]: 


Proposition 1. (Split) Given a polynomial zonotope PZ = <c, G, Gr, E)pz © 
R” and the index r € {1,...,p} of one dependent factor, the operation 
split(PZ,r) returns two polynomial zonotopes PZ1, PZ. satisfying PZ, U 
PZ2 = PZ: 
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with 


E, = 0 1 ee Eri) —1 Eri) , 
Eqr+1 „pha Egrta — phi oe FEtr4i,...,p}.4) Egrti zapt). 


Alk) L fp) (k) 
GM = [Gey BB Goo], 


Eri Pa; 
HL) = 0.589 (MOO), 4) = -052e (ACB) moa2) — 1) (709), 
j f j 


where xmody, x,y € No is the modulo operation and (®©), w, z E€ No denotes the 
binomial coefficient. To remove redundancies we subsequently apply the compact 


operation as defined in [20, Prop. 2] to PZ, and P22. 


Proof. The split operation is based on the substitution of the selected depen- 
dent factor a, with two new dependent factors a,; and a,.2: 


{ar | ar € [-1,1]} = {0.5(1 + ar,1) — 0.5(1 + a2) | 7,1, @r2 € [-1, 1]} 


{0.5(1 + ay,1) | ara € [-1,1]} U { —0.5(1 + ar2) | ar,2 € [-1, 1}. (10) 


Inserting this substitution into the definition of polynomial zonotopes in Defini- 
tion 5 yields 


h p q 
Feas 10 
PZ = f + 5 (II Qk (k, ’) Gei) + NBG) Qk, bj E [-1, u} (10) 
i=1 \k=1 = 
h Pp à 
E(k,i 14+ arity) Ecri) 
5D (IIa N(=) Ge BGs) Qk, Bj, Qr E€ HLN} 
i=1 kal = 
=PZ, 
h p P 
Elki 1 + Qr, 2 \ Eiri) 
j {ee (To ) ( —2 *) Gy DS BGI) Qk, bj, Qr,2 € [-1, u} i 
E j=1 
= -r 
=PZ2 
Finally, with 
1+ ari \ Foo _ po) AED a)_2 no Ped 
a = bio + bii Orr + big api to + OL, On 
PPE) POY 52) p (2) 2 (2) Bena) 
( —2 ) = bio + bii Or,2 + bigar Foo + bi Be, Or2 


we obtain the equations above. 
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The split operation for polynomial zonotopes is not exact, meaning that the 
resulting sets usually overlap (see Fig. 2). To minimize the size of the overlapping 
region we split the dependent factor with index r that maximizes the following 
heuristic: 


h 
max 1-052) Gc alle, il 
re{1,...,p} 2 ( ) | lle ( ) 
Evr,i)>1 


where G e R"*” and E € eX" are the generator and exponent matrix of the 
polynomial zonotope. Moreover, since the goal of splitting in Algorithm 1 is to 
verify a certain specification, it is advisable to first project the polynomial zono- 
tope onto the halfspace normal directions of the unsafe set U before evaluating 
the heuristic (11) in order to direct the splitting process towards directions that 
are beneficial for verification. 

Note that the polynomial zonotope refinement technique presented in this 
section is not restricted to verification of Koopman linearized systems, but can 
equally be applied for collision checks of polynomial zonotopes or Taylor models 
with halfspaces and polytopes in general. Moreover, by inverting the inequality 
constraints polynomial zonotope refinement can also be applied to check if a 
Taylor model or polynomial zonotope is contained in a halfspace or polytope. 


5 Experimental Results 


We now evaluate the performance of random Fourier feature observables and 
our novel reachability algorithm on various benchmark systems. For this, we 
compare our approach with the closest method from the literature [5]. Since the 
algorithms presented there are implemented in Julia, we also implemented our 
approach in Julia to obtain a fair comparison of the computation time. In our 
implementation we use the package TaylorModels.jl? for Taylor model arithmetic 
and the package DataDrivenDiffEq.jl° for extended dynamic mode decomposi- 
tion. All computations are carried out on a 3.2 GHz 8-core AMD Ryzen 7 5800H 
processor with 16 GB memory. We published our implementation together with 
a repeatability package that reproduces the results shown in this paper as a 
CodeOcean compute capsule’. 


5.1 Benchmarks 


Let us first define all benchmarks that we use for the evaluation. Again, we 
consider the same systems and specifications as in [5] for a fair comparison: 


? https: //github.com/Julialntervals/TaylorModels.jl. 
3 https: //datadriven.sciml.ai/. 
* https: //codeocean.com/capsule/8730054/tree/v1. 
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Roessler Attractor: The dynamic equations for the Roessler attractor [32] are 
ry = —-%2 — LZ 
re = Tı + 0.2 T2 
£3 =0.2+ T3 (zı — 5.7), 

and we consider the initial set % = [—0.05, 0.05] x [—8.45, —8.35] x [—0.05, 0.05], 


the final time tr = 6, and the unsafe region x2 > 6.375 — 0.025 - i parameterized 
by i e [0, 20]. 


Steam Governor: The dynamic equations for the steam governor [37] are 


tı = 
t2 = £3 sin(x1) cos(x1) — sin(z1) — 3 22 
t3 = cos(#1) — 1, 


and we consider the initial set Xo = [0.95, 1.05] x [—0.05, 0.05] x [0.95, 1.05], the 
final time tr = 3, and the unsafe set xo < —0.25 + 0.01 - i parameterized by 
i e (0, 10]. 


Coupled Van-der-Pol Oscillator: The dynamic equations for the coupled 
Van-der-Pol oscillator [30] are 


£1 = T2 T3 = £4 
T = (1 — ©?) £2 — 21 + (z3 — 21) t4 = (1 — 22) vq — z3 + (z1 — 23), 
and we consider the initial set XY = [-—0.025,0.025] x [0.475,0.525] x 


[—0.025, 0.025] x [0.475,0.525], the final time tp = 2, and the unsafe set 
zı > 1.25 — 0.05 - i parameterized by i€ [1,16]. 


Biological System: The dynamic equations for the biological system [19] are 


£ı = —0.4 £1 +5273 £4 £5 = —5 £5 £e + 5 £3 T4 
2 = 0.4 £1 — T2 £e = 0.5 £7 — 5 T5 £6 
£3 = £2 — 5 £3 £4 £7 = —0.5 £7 + 5 £5 Xe, 


T4 = 5 £5 £e — 5 L3 T4 


and we consider the initial set Xo = [0.99, 1.01] x --- x [0.99, 1.01], the final time 
tr = 2, and the unsafe set x4 < 0.883 + 0.002 - i parameterized by i € [0, 20]. 


5.2 Approximation Error 


We first investigate the accuracy of the Koopman linearized system with 
respect to the original nonlinear dynamics, where we compare our random 
Fourier feature observables with the ad hoc observables from [5]. These ad hoc 
observables consist of multi-variate polynomials of the system state x up to a 
fixed order, trigonometric functions of the time t, and combinations of these 
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Fig. 3. Relative simulation error between Koopman linearized systems and the original 
nonlinear system in percent. 


(e.g., £ı @2sin*(t)cos(t)). To obtain the data traces required for extended 
dynamic mode decomposition we simulate the original nonlinear systems for 
500 points sampled from the corresponding initial set, where a Sobol sequence 
is used for sampling. For the generation of the random Fourier feature observ- 
ables according to (6) we use the parameter £ = 0.3 and m = 71 for the Roessler 
attractor, l = 1.62 and m = 72 for the steam governor, £ = 1.24 and m = 132 for 
the coupled Van-der-Pol oscillator, and £ = 1.81 and m = 105 for the biological 
system, where £ is the lengthscale parameter of the kernel and the number of 
observables m is chosen identical to the one used for the ad hoc observables [5]. 
As a measure for the accuracy we use the Euclidean distance between simulated 
trajectories for the original nonlinear system and the Koopman linearized sys- 
tem. The initial points for these trajectories are the center and the vertices of 
the initial set. According to Fig. 3 random Fourier feature observables are for the 
steam governor and the Roessler attractor more accurate than than the ad hoc 
observables used in earlier work [5]. Moreover, while for the short time horizons 
considered in Fig. 3 it seems that the ad hoc observables are more precise for the 
coupled Van-der-Pol oscillator and the biological system, over longer time hori- 
zons the error of the ad hoc observables is exploding. This is visualized in Fig. 4, 
where the trajectory corresponding to the ad hoc observables progresses into a 
completely different direction than the original system, while random Fourier 
features stay accurate. In this way, random Fourier features are not only a more 
systematic approach for choosing observables, but also improve the precision of 
the resulting Koopman linearized system. 
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Fig. 4. Comparison of simulations for Koopman linearized systems with the ground 
truth from the original nonlinear system for a time horizon of tp = 10, where the 
biological system is shown on the left and the coupled Van-der-Pol oscillator is shown 
on the right. 


5.3 Verification Using Reachability Analysis 


We now compare our novel verification algorithm for Koopman linearized sys- 
tems with the verification strategies presented in [5]. In particular, we compare 
to verification of the original nonlinear system using Flow* [9], direct encoding 
of nonlinear constraints using a SMT solver [5, Sec. 4.1], and zonotope domain 
splitting [5, Sec. 4.4]. Both approaches from [5] consider discrete-time safety, 
where the system is considered to be safe if the specification is satisfied at time 
points 0, At, 2At,...,t- with At = 0.05. While our verification algorithm also 
supports continuous-time safety, we consider discrete-time safety here to obtain 
a fair comparison. Note that for discrete-time safety the reachable set computa- 
tion in Line 5 of Algorithm 1 simplifies to R; = |In 0] e4’4! Xo, i = 0,...,tp/At. 
For the comparison we consider both, the ad hoc observables used in [5] as well 
as the random Fourier feature observables presented here. 

The resulting computation times for verification are summarized in Table 1. 
For all benchmark instances our novel verification algorithm has the lowest com- 
putation time, and is often even magnitudes faster than the other verification 
approaches. The main reason for this is that with our polynomial refinement 
strategy we can completely avoid the computational expensive calls to SMT 
solvers used by the other methods. Moreover, while the computation time for 
the other approaches often depends on how difficult it is to verify or falsify the 
specification, our algorithm exhibits roughly equal runtimes for all specifications. 
The explanation for this is that the polynomial zonotope refinement approach 
that we use for the collision checks with unsafe sets is very efficient, so that the 
majority of the runtime is spent on the computation of the image through the 
observable function using Taylor model arithmetic, a task which is independent 
from the specification. Interestingly, using random Fourier features instead of ad 
hoc observables can either prolong or accelerate the verification process, depend- 
ing on the benchmark instance and verification approach used. However, even if 
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Table 1. Computation time in seconds for verification or falsification of the benchmark 
systems from Sect. 5.1 using different approaches, where the symbol — indicates that 
the computation timed-out after 2h. The parameter i specified in the second column 
changes the specification, and the third column shows weather the specification can be 
verified or falsified. 


i Safe? Flow* Direct Enc. Zono. Split. Our App. 


ad hoc fourier ad hoc fourier adhoc fourier 


251 788 398 0.57 171 0.20 3.00 

Coupled VP 8 x 497 680 120 53 232 0.79 3.77 
6 x 1665 557 373 18 38 0.20 2.99 

1 v 260 470 = 0.59 = 0.44 1.95 

Biological 5 v 250 426 = 49 = 0.44 1.73 
0 v 238 427 = 179 -= 0.46 1.76 

0 v 61 97 149 182 42 0.12 0.25 

Steam 5 x 285 59 40 37 38 0.38 0.56 
0 x 77 29 20 18 27 0.12 0.26 

0 v 55 81 291 9.53 117 0.55 0.35 

Roessler 10 x 78 177 385 5.01 241 0.22 0.75 
20 x 55 174 158 3.5 86 0.21 0.34 


they prolong the time required for verification in some cases, the usage of random 
Fourier feature observables can be justified by their superior accuracy demon- 
strated in Sect. 5.2. Yet another observation is that direct encoding and zonotope 
domain splitting are not able to verify or falsify the high-dimensional biological 
model at all if random Fourier feature observables are used. The reason for this 
is that both of these approaches apply an SMT solver for verification, which do 
not scale to high-dimensions and are not well-suited for handling the trigono- 
metric functions as well as the high coupling between variables used for random 
Fourier feature observables. So in summary our proposed verification algorithm 
outperforms all exiting verification techniques for Koopman linearized systems 
in terms of runtime. In addition, it handles different types of observables well 
and scales to high-dimensional systems. 


6 Conclusion 


We presented two major improvements for reachability analysis of Koopman 
operator linearized systems: First, we use random Fourier features as observable 
functions, which yields a systematic approach requiring much less user insight 
than previous methods. Second, we handle the nonlinear transformation of the 
initial state by combining Taylor model arithmetic with polynomial zonotope 
refinement. As demonstrated on several nonlinear system benchmarks, the com- 
bination of these two techniques is both extremely accurate and extremely fast. 

The main trade-off with Koopman linearized systems is that the guarantees 
are on the system approximation, not the original system. Despite this, we believe 
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the method could still be useful for verification in systems engineering, where 
the goal is to produce evidence that the system meets its requirements. It could 
also be effective for finding unsafe counterexamples—falsification—or to analyze 
systems where only simulation code is provided, or even real-world systems where 
sensor measurements could be used to create a Koopman linearized model for 
analysis. As such systems do not have models given with symbolic differential 
equations, most traditional reachability methods cannot be applied. 


Acknowledgements. This material is based upon work supported by the Air Force 
Office of Scientific Research, the DARPA Assured Autonomy program under the United 
States Air Force, and the Office of Naval Research under award numbers FA9550-19- 
1-0288, FA9550-21-1-0121, FA9550-22-1-0450, FA2386-17-1-4065, FA8750-19-C-0092, 
and N00014-22-1-2156. Any opinions, findings, and conclusions or recommendations 
expressed in this material are those of the author(s) and do not necessarily reflect the 
views of the United States Air Force, DARPA, or the United States Navy. Distribution 
Statement A: Approved for Public Release; Distribution is Unlimited. PA: AFRL-2022- 
1356. 


References 


1. Althoff, M.: Reachability analysis of nonlinear systems using conservative polyno- 
mialization and non-convex sets. In: Proceedings of the International Conference 
on Hybrid Systems: Computation and Control, pp. 173-182 (2013) 

2. Althoff, M.: Reachability analysis of large linear systems with uncertain inputs in 
the Krylov subspace. Trans. Autom. Control 65(2), 477—492 (2019) 

3. Amini, A., et al.: Error bounds for Carleman linearization of general nonlinear 
systems. In: Proceedings of the International Conference on Control and its Appli- 
cations, pp. 1-8 (2021) 

4. Bak, S., et al.: Numerical verification of affine systems with up to a billion dimen- 
sions. In: Proceedings of the International Conference on Hybrid Systems: Com- 
putation and Control, pp. 23-32 (2019) 

5. Bak, S., et al.: Reachability of black-box nonlinear systems after Koopman opera- 
tor linearization. In: Proceedings of the International Conference on Analysis and 
Design of Hybrid Systems, pp. 253-258 (2021) 

6. Bogomolov, S., et al.: Reach set approximation through decomposition with low- 
dimensional sets and high-dimensional matrices. In: Proceedings of the Interna- 
tional Conference on Hybrid Systems: Computation and Control, pp. 41-50 (2018) 

7. Carleman, T.: Application de la théorie des équations intégrales linéaires aux 
systèmes d’équations différentielles non linéaires. Acta Math. 59, 63-87 (1932) 

8. Chen, X., et al.: Taylor model flowpipe construction for non-linear hybrid systems. 
In: Proceedings of the Real-Time Systems Symposium, pp. 183-192 (2012) 

9. Chen, X., Abraham, E., Sankaranarayanan, S.: Flow*: an analyzer for non-linear 
hybrid systems. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 
258-263. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39799- 
8-18 

10. DeGennaro, A.M., Urban, N.M.: Scalable extended dynamic mode decomposi- 
tion using random kernel approximation. SIAM J. Sci. Comput. 41(3), 1482-1499 
(2019) 


Iii: 


12. 


13. 


14. 


15. 


16. 


I. 


18. 


19. 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


27. 


28. 


29. 


30. 


Reachability of Koopman Linearized Systems 509 


Duggirala, P.S., Viswanathan, M.: Parsimonious, simulation based verification of 
linear systems. In: Proceedings of the International Conference on Computer Aided 
Verification, pp. 477—494 (2016) 

Forets, M., Pouly, A.: Explicit error bounds for Carleman linearization. arXiv 
preprint arXiv:1711.02552 (2017) 

Forets, M., Schilling, C.: Reachability of weakly nonlinear systems using Carle- 
man linearization. In: Bell, P.C., Totzke, P., Potapov, I. (eds.) RP 2021. LNCS, 
vol. 13035, pp. 85-99. Springer, Cham (2021). https://doi.org/10.1007/978-3-030- 
89716-16 

Frehse, G., et al.: SpaceEx: scalable verification of hybrid systems. In: Proceed- 
ings of the International Conference on Computer Aided Verification, pp. 379-395 
(2011) 

Girard, A.: Reachability of uncertain linear systems using zonotopes. In: Proceed- 
ings of the International Conference on Hybrid Systems: Computation and Control, 
pp. 291-305 (2005) 

Han, Y., et al.: Deep learning of Koopman representation for control. In: Pro- 
ceedings of the International Conference on Decision and Control, pp. 1890-1895 
(2020) 

Jaulin, L., Kieffer, M., Didrit, O., Walter, E.: Interval analysis. In: Applied Interval 
Analysis, pp. 11-43. Springer (2001) 

Kim, D.W., et al.: Evaluation of the performance of clustering algorithms in kernel- 
induced feature space. Pattern Recogn. 38(4), 607-611 (2005) 

Klipp, E., et al.: Systems Biology in Practice: Concepts, Implementation and Appli- 
cation. Wiley, Hoboken (2005) 

Kochdumper, N., Althoff, M.: Sparse polynomial zonotopes: a novel set represen- 
tation for reachability analysis. Trans. Autom. Control 66(9), 4043-4058 (2021) 
Kochdumper, N., et al.: Utilizing dependencies to obtain subsets of reachable sets. 
In: Proceedings of the International Conference on Hybrid Systems: Computation 
and Control (2020) 

Koopman, B.O.: Hamiltonian systems and transformation in Hilbert space. Proc. 
Natl. Acad. Sci. U.S.A. 17(5), 315-318 (1931) 

Kurzhanski, A.B., Varaiya, P.: Ellipsoidal techniques for reachability analysis. In: 
Proceedings of the International Conference on Hybrid Systems: Computation and 
Control, pp. 202-214 (2000) 

Liu, J.P., et al.: Efficient quantum algorithm for dissipative nonlinear differential 
equations. Proc. Natl. Acad. Sci. U.S.A. 118(35), e2026805118 (2021) 

Makino, K., Berz, M.: Taylor models and other validated functional inclusion meth- 
ods. Int. J. Pure Appl. Math. 4(4), 379-456 (2003) 

Mauroy, A., Mezié, I., Susuki, Y. (eds.): The Koopman Operator in Systems and 
Control. LNCIS, vol. 484. Springer, Cham (2020). https://doi.org/10.1007/978-3- 
030-35713-9 

Mitchell, I.M., et al.: A time-dependent Hamilton-Jacobi formulation of reachable 
sets for continuous dynamic games. Trans. Autom. Control 50(7), 947-957 (2005) 
Otto, S.E., Rowley, C.W.: Koopman operators for estimation and control of dynam- 
ical systems. Annu. Rev. Control Robot. Auton. Syst. 4, 59-87 (2021) 

Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: Pro- 
ceedings of the International Conference on Neural Information Processing Sys- 
tems, pp. 1177-1184 (2007) 

Rand, R., Holmes, P.: Bifurcation of periodic motions in two weakly coupled Van 
der Pol oscillators. Int. J. Non-Linear Mech. 15(4-5), 387-399 (1980) 


510 S. Bak et al. 


31. Rauh, A., et al.: Carleman linearization for control and for state and disturbance 
estimation of nonlinear dynamical processes. In: Proceedings of the International 
Conference on Methods and Models in Automation and Robotics, pp. 455—460 


(2009) 

32. Rossler, O.E.: An equation for continuous chaos. Phys. Lett. A 57(5), 397-398 
(1976) 

33. Rudin, W.: Fourier Analysis on Groups. Courier Dover Publications, Mineola 
(2017) 


34. Sankaranarayanan, S.: Automatic abstraction of non-linear systems using change of 
bases transformations. In: Proceedings of the International Conference on Hybrid 
Systems: Computation and Control, pp. 143-152 (2011) 

35. Sankaranarayanan, S.: Change-of-bases abstractions for non-linear hybrid systems. 
Nonlinear Anal. Hybrid Syst. 19, 107-133 (2016) 

36. Schdlkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Reg- 
ularization, Optimization, and Beyond. MIT Press, Cambridge (2018) 

37. Sotomayor, J., et al.: Bifurcation analysis of the Watt governor system. Comput. 
Appl. Math. 26(1), 19-44 (2007) 

38. Takeda, H., et al.: Kernel regression for image processing and reconstruction. Trans. 
Image Process. 16(2), 349-366 (2007) 

39. Tuia, D., et al.: Learning relevant image features with multiple-kernel classification. 
Trans. Geosci. Remote Sens. 48(10), 3780-3791 (2010) 

40. Wetzlinger, M., et al.: Adaptive parameter tuning for reachability analysis of linear 
systems. In: Proceedings of the International Conference on Decision and Control, 
pp. 5145-5152 (2020) 

41. Williams, M.O., et al.: A data-driven approximation of the Koopman operator: 
extending dynamic mode decomposition. J. Nonlinear Sci. 25(6), 1307-1346 (2015) 

42. Williams, M.O., et al.: A kernel-based method for data-driven Koopman spectral 
analysis. J. Comput. Dyn. 2(2), 247-265 (2015) 

43. Yeung, E., et al.: Learning deep neural network representations for Koopman oper- 
ators of nonlinear dynamical systems. In: Proceedings of the American Control 
Conference, pp. 4832-4839 (2019) 


Open Access This chapter is licensed under the terms of the Creative Commons 
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), 
which permits use, sharing, adaptation, distribution and reproduction in any medium 
or format, as long as you give appropriate credit to the original author(s) and the 
source, provide a link to the Creative Commons license and indicate if changes were 
made. 

The images or other third party material in this chapter are included in the 
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the 
material. If material is not included in the chapter’s Creative Commons license and 
your intended use is not permitted by statutory regulation or exceeds the permitted 
use, you will need to obtain permission directly from the copyright holder. 


® 


Check for 
updates 


RINO: Robust INner and Outer 
Approximated Reachability of Neural 
Networks Controlled Systems 


Eric Goubault) and Sylvie Putot 


CAV 
LIX, Ecole Polytechnique, CNRS and Institut Artifact 


Polytechnique de Paris, 91128 Palaiseau, France Evaluation 


{eric.goubault ,sylvie.putot}@polytechnique.edu ies 


Abstract. We present a unified approach, implemented in the RINO 
tool, for the computation of inner and outer-approximations of reach- 
able sets of discrete-time and continuous-time dynamical systems, pos- 
sibly controlled by neural networks with differentiable activation func- 
tions. RINO combines a zonotopic set representation with generalized 
mean-value AE extensions to compute under and over-approximations 
of the robust range of differentiable functions, and applies these tech- 
niques to the particular case of learning-enabled dynamical systems. 
The AE extensions require an efficient and accurate evaluation of the 
function and its Jacobian with respect to the inputs and initial condi- 
tions. For continuous-time systems, possibly controlled by neural net- 
works, the function to evaluate is the solution of the dynamical system. 
It is over-approximated in RINO using Taylor methods in time cou- 
pled with a set-based evaluation with zonotopes. We demonstrate the 
good performances of RINO compared to state-of-the art tools Verisig 
2.0 and ReachNN* on a set of classical benchmark examples of neural 
network controlled closed loop systems. For generally comparable preci- 
sion to Verisig 2.0 and higher precision than ReachNN*, RINO is always 
at least one order of magnitude faster, while also computing the more 
involved inner-approximations that the other tools do not compute. 


Keywords: Neural networks verification - Reachability analysis - 
Robustness - Inner-approximation 


1 Introduction 


Over the last few years, neural networks have emerged as an increasingly classical 
choice for the control of autonomous systems, in particular due to their properties 
as universal function approximators. However, their adoption in safety-critical 
systems, the inherent uncertainties from the dynamic environment, and their 
sensitivity to adversarial examples make it crucial to establish their safety and 
robustness. This verification is challenging because of the complex non-linear 
characteristics of neural networks. Recent works come up with some approaches 
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and tools to bound the output uncertainty of neural networks with respect to 
input perturbations. However, many of them are restricted to the analysis of 
networks with ReLU activation functions. Moreover, the approaches considering 
general differentiable activation functions and systems with general non linear 
dynamics provide over-approximations, which conservatism is difficult to esti- 
mate. RINO proposes a scalable and adaptive approach to compute both inner 
(or under) and outer (or over) approximations for the closed loop reachabil- 
ity problem of neural network controlled systems, with differentiable activation 
functions. The outer-approximation allows for property verification, while the 
inner-approximation allows for property refutation. Combined, the inner and 
outer-approximations allow to assess the conservatism of the approximations. 

As the behavior of a neural network controlled closed-loop system relies on 
the interaction between the continuous dynamics and the neural network con- 
troller, a good precision requires to not only compute the output range but also 
describe the input-output mapping for the controller. In this work, we propose to 
use a zonotope-based abstraction to compute in a unified way both the reachable 
sets of neural networks and dynamical systems. This seamless integration of the 
reachability of neural networks and dynamical systems presents the advantage of 
a natural propagation of useful correlations through the different components of 
the closed loop system, resulting in an efficient and precise approach compared 
to many existing works which rely on external reachability tools. 


Contributions 


— RINO implements all ideas presented in [8-11] for the joint computation of inner 
and outer approximations of robustly reachable sets of differentiable nonlin- 
ear discrete-time or continuous-time systems (without neural networks in the 
loop), possibly with constant delays. These previous works demonstrated the 
good scaling properties of our approach on different examples including a full 
nonlinear quadcoptor flight model but the tool was never presented as such. 

— Additionally, we demonstrate here that an application of these ideas to the 
case of neural networks enabled dynamical systems provides very competitive 
results for the over-approximation compared to the state of the art (at least 
similar precision and one order of magnitude faster) while also providing the 
first approach for inner-approximation of the reachable sets of such systems, 
which we use to falsify some safety properties. 

— Finally, RINO also computes approximations of output ranges that are reach- 
able robustly or adversarially with respect to a subset of inputs: while these 
robust ranges are mostly used in this work to compute inner-approximations 
of joint ranges of state variables instead of projections, we believe this sen- 
sitivity information can be a useful tool in the future in particular to assess 
global robustness properties of neural networks. 


Related Work. The safety verification for DNNs has received considerable atten- 
tion recently, with several threads of work being developed. We draw below a 
non exhaustive panorama focusing on available tools for reachability analysis of 
neural network controlled systems with smooth activation functions. 
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Different approaches have been proposed to the reachability analysis closed- 
loop systems with neural network controllers, often by a transformation to a 
continuous or hybrid system reachability. Sherlock [6] targets both the open- 
loop and closed-loop problems with ReLU activation functions, in particular 
using the regressive polynomial rule inference approach [5] for the closed-loop, 
and Flow* [3] for the reachability of the dynamical system. NNV [24] also targets 
both the open loop and closed loop verification problems, with various activa- 
tion functions and set representations such as polyhedra or star sets [23], and 
different reachability algorithms for dynamical systems relying on CORA [1] and 
the MPT toolbox [18]. ReachNN [13] and its successor ReachNN* [7] propose 
a reachability analysis based on Bernstein polynomials for closed-loop systems 
with general activation functions, also relying on Flow* [3] for the reachabil- 
ity of the dynamical system. Verisig [14] handles NNCS with nonlinear plants 
controlled by sigmoid-based networks, exploiting the fact that the sigmoid is 
the solution to a differential equation to transform the neural network into an 
equivalent hybrid system, which is then fed to Flow*. Verisig 2.0 [15] uses pre- 
conditioned Taylor Models to propagate reachable sets in neural networks, and 
also relies on Flow* for reachability of the hybrid system component. 

The very recent works [21] and [12] implemented respectively over JuliaReach 
and in POLAR are also closely related to our work. In [21], the authors imple- 
ment a bridge between zonotope abstractions and Taylor model abstractions in 
order to combine tools analyzing controllers (e.g. using zonotopes like deepZ 
[22]) with tools analyzing ordinary differential equations (e.g. Flow* [3]). In [12], 
the authors use a polynomial arithmetic made up of a combination of Berstein 
polynomials and Taylor models to iteratively overapproximate networks layers, 
according to whether the activation function is differentiable or not. 


2 Problem Statement and Background 


2.1 Robust Reachability of Closed-Loop Dynamical Systems 


We consider in this work a closed-loop system consisting of a plant with states 
x, modeled as a discrete-time or continuous-time system with time-varying dis- 
turbances w and inputs u, where some components of the control inputs can be 
the output a neural network A taking x as input. For notation’s simplicity, we 
focus on continuous-time systems and define: 


i) = fet) ult) w) ift>0 (1) 
x(t) = zo ift=0 

where f is a sufficiently smooth function and at least C!, and controls u and 

disturbances w are also supposed to be sufficiently smooth C* for some k > 0 

stepwise. This allows discontinuous controls and disturbances, where the discon- 
tinuities can only appear at discrete times tj. 

The neural network h is a fully-connected feedforward NN with differentiable 

activation functions, defined as the composition h(x) = hz ohz_10...hi(«) of 
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L layers where each layer hj(x) = o(W;x + bi) performs a linear transform 
followed by a sigmoid or hyperbolic tangent activation ø. We assume the control 
is decomposed as u(t) = (u(t), u2(t)) where w(t) is a control input defined in 
Uz and u(t) is the output of the neural network controller. This controller is 
executed in a time-triggered fashion with control step T, so that ui(t) = h(x(tx)), 
for t € [tk,tk + T), where tk = kT for positive integers k. System (1) can then 
be rewritten as 
x(t) = f(x(t), h(x(tk)) u(t), w(t)) if t € ftk,tk +T), th = kT, kK > 0 
E F (2) 
x(t) = zo ift=0 
Let yf (t; x0, u2, w) for time t € T denote the time trajectory of (2) with initial 
state x(0) = xo, for input signal uz and disturbance w. 


We consider the problem of computing inner and outer-approximations of 
robust reachable sets as introduced in [9], defined here as 


R” e(t; Xo, U2, W) {x | Vw € W, Ju2 € Us, Jro € Xo, T= p(t; x0, u, w)} 


Note that this notion of robust reachability extends the classical notions of mini- 
mal and maximal reachability [20]. We use the subscript notation AE to indicate 
that the reachable set is minimal with respect to the disturbances w (universal 
quantification A) and maximal with respect to the input uz (existential quan- 
tification indicated by £), and that the universal quantification always precedes 
the existential quantification. 


2.2 Mean-Value Inner and Outer-Approximating Robust Extensions 


A classical but often overly conservative way to overapproximate the image of a set 
by a real-valued function f : R™ — R is the natural interval extension F : IR’ —> 
IR, IR being the set of intervals with real bounds, which consists in replacing real 
operations by their interval counterparts in the expression of the function. 

A generally more accurate extension relies on a linearization by the mean-value 
theorem. Mean-value extensions can be generalized to compute ranges that are 
robust to disturbances, identified as a subset of the input components. Let f be 
a continuously differentiable function from R™ to R with input decomposed as 
x = (u,w) € (U,W) C IR”. We define the robust range of function f on g, 
robust with respect to component w € W, as Ri, W) = {z|Vw € W, Ju € 
U, z= f(u, w)}. 

For a continuously differentiable function f : R™ — R”, we note Vf = 
(Vi fog = (52 iein 1<j<m its Jacobian matrix. We note (x,y) the scalar 


Sh SS 


product of vectors x and y, and |x| the absolute value extended componentwise. 
For a vector of intervals ¥ = [¥, ¥], we note c(¥) = (¥ + ¥)/2.0 and r(¥) = 
(X — X)/2.0 its center and radius defined componentwise. 


Theorem 1. ((8], slightly simplified version of Thm. 2). Let f be a con- 
tinuously differentiable function from R™ to R and X =U x W C IR”. Let F°, 
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Vx and VÝ be vectors of intervals such that ce(X) C F°, {|Vu f(u, w)| , (u, w) € 
X} CVX and {|Vuf(u,w)|,(u,w) € X} C VX. We have: 


[Fo — (Via, r(U)) + (VE,r(W)), £ + (VŽ, r(U)) — (VE, r(W))] E Re, W) 
Rhe U, W) € [Fo — (VF, r(U)) + Va, rW)), Fo + (VZ, r(U)) — Var) 


Theorem 1 provides inner and outer-approximations of the robust range (or 
of the classical range when there is no disturbance component w) of scalar-valued 
functions, or of the projections on each component of vector-valued functions, 
using bounds on the slopes on the input set. The result is useful to compute a 
projected range that is robustly reachable with respect to the disturbances w, or 
as a brick in computing an under-approximation of the image of a vector-valued 
function, as stated in Theorem 3 in [8]. 

Note that the accuracy of the mean-value AE extension can be improved 
with an evaluation by a quadrature formula ([10], Sect. 4.2). Alternatively, an 
order 2 Taylor-based extension ([10], Sect.3) can be used. 


2.3 Reachability of Neural Network Controlled Closed-Loop 
Systems 


The inner and outer approximations defined in Sect.2.2 can be computed for 
f being a simple function, possibly involving a neural network evaluation, or f 
being the function defined by the iterated values of a discrete systems, or finally 
f being the solution flow of closed-loop system (2). 

In both discrete-time and the continuous-time cases, and whether some neu- 
ral network controller is present or not, the evaluation of an outer-approximation 
of the image of the solution and its Jacobian with respect to inputs and distur- 
bances over sets is needed in order to apply Theorem 1. 

In our work and implementation, we advocate the use of a unique abstraction 
by affine forms (or zonotopes for the geometric view of a tuple of variables repre- 
sented by affine forms) for these sets and these evaluations, including performing 
reachability of the neural network controller. This abstraction is very convenient 
and versatile to over-approximate any smooth function, providing a good trade- 
off between efficiency and precision in most cases (and for more precision, one 
can consider extensions with e.g. polynomial zonotopes [2]). 

For continuous-time systems, we use Taylor expansions in time of the solution 
on a time grid. To build these Taylor expansions, we evaluate function f and its 
(Lie) derivatives over affine forms by a combination of automatic differentiation 
and numerical evaluation by affine arithmetic, as described in e.g. [9]. The neural 
network is seen as a nonlinear function h, composed with f to build function g 
for which we compute the solution flow. Theorem 1 is applied to this solution 
flow. We build the abstraction of h and thus g by a simple propagation of affine 
forms by affine arithmetic in the network: linear transformers are exact, and 
we propagate affine forms through the activation functions seen as standard 
nonlinear functions relying on the elementary exponential function, tanh(x) = 
2/(1 + e7?) — 1 and sig(x) = 1/(1 + .e7*). For differentiating the activation 
functions, we use tanh’(x) = 1.0 — tanh(x)? and sig'(x) = sig(x)(1 — sig(x)). 
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3 Implementation 


As mentioned in the introduction, RINO implements all ideas presented in [8-11] 
for the joint computation of inner and outer approximations of robustly reachable 
sets of differentiable nonlinear discrete-time [8,10] or continuous-time systems 
[8,9], possibly with constant delays [11]. For experiments with systems without 
neural networks, we refer to the results presented in these works, obtained with 
a previous version of RINO. 

RINO is written in C++. Intervals and zonotopes are used for set represen- 
tation: the tool relies on the FILIB++ library [19] for interval computations and 
the aaflib library! for affine arithmetic [4]. Ole Stauning’s FADBAD++ library? 
is used for automatic differentiation: its implementation with template enables 
us to easily evaluate the differentiation in the set representation of our choice 
(affine forms or zonotopes mostly). The tool takes as inputs: 


— an open-loop or closed loop system, either discrete time or continuous-time, 
which for now is hard-coded in C++, 

— an optional neural network, provided to the tool in a format directly inspired 
from the format analyzed by Sherlock [6], which can be used as some inputs 
of the closed-loop system, 

— an optional configuration file to set initial values, input and disturbances 
ranges, and some parameter of the analysis (such as time step, order of Taylor 
expansion in time) 


It computes inner and outer-approximations of the projection on each component 
of ranges, as well as joint 2D and 3D inner-approximations (provided as yaml 
file and Jupyter/python-produced figures). Additionally to the classical ranges, 
RINO computes approximations of output ranges that are reachable robustly 
or adversarially with respect to disturbances, specified as a subset of inputs. 
In the experiments presented herafter, we consider examples only of classical 
reachability, for which comparisons with existing work are available, but the 
extension to robust reachability based on our previous work is straightforward. 


4 Experiments 


For space reasons, we focus here on the main novelty which is the extension 
of this previous work to compute under and over-approximations of (robust) 
reachable sets of neural network controlled systems (2). 


Choice of Tools and Benchmark Examples. We compare RINO against 
ReachNN* and Verisig 2.0 that are the most recent fully-fledged reachability 
analyzers for neural network based control systems, and for which comparisons 
with other tools on classical benchmarks are well documented in e.g. [15]. They 
both improve on previous versions, Verisig and ReachNN, and on state of the art 


1 http://aaflib.sourceforge.net. 
? http: //www.fadbad.com. 
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tools Sherlock, also based on Flow*, and NNV. As noted in e.g. [15]: “Firstly, 
note that Verisig takes significantly more time to compute reachable sets (21 
times slower in the case of the B5 benchmark). Furthermore, Verisig is unable 
to verify some properties due to increasing error. Note that NNV is unable to 
verify any of the properties considered in this paper due to high approximation 
error.”. Remark though that there has been some amelioration to the internal 
solvers used in NNV which should qualify the latter statement (see e.g. [16]). We 
do not compare with the implementation in JuliaReach [21] since, first, timings 
are difficult to compare with an interpreted framework, and second, because it 
would require mixing several tools together, with many potential combinations. 
We try to provide elements of comparison with POLAR [12], but in many ways 
the latter addresses a different problem, with the emphasis on being able to 
interpret e.g. ReLU activation functions. 


Table 1. List of benchmarks (see [15]) 
Name Dynamics Initial set Horizon | Control step 
i ti = — 0.5, —0.4: 
Mountain | #1 = 22 [ — 0.5, —0.48 T= i 
Car t2 = 0.0015u — 0.0025 cos(3x1) [0, 0.001] 
; ri =a} +23 
discrete MC n+l [ — 0.5, —0.48 
antl = £} + 0.0015u" T=75 1 
(stepsize 1) [0, 0.001] 
—0.0025 cos(3x7 ) 
t1 = 29 [—0.77,—0.75 
bo = —g1 +0.1 xsi — 0.45, —0.43 
TORA PE co a T=5 0.1 
t3 = T4 [0.51, 0.54] 
ba =u [— 0.3, —0.28] 
tı = £2, t4 = T5 n= (90, 91] 
bo = dy, de = = [32, 32.05 
ACC a z=] l pes 0.1 
t3 = —4 — 0.0001z2 — 2x3 z4 = [10, 11] 
t6 = 2u — 0.00012? — 226 x5 = [30, 30.05] 
Bl t1 = 22 (0.8, 0.9] T=7 d5 
Ex 1 in [7]) t2 = uz? — 24 [0.5, 0.6] 
a _ 7d 
B2 tı = z2 — TÌ [0.7, 0.9] T=18 aD 
Ex 2 in [7]) tg =u (0.7, 0.9] 
B3 tı = —x1 (0.1 + (a1 + x2)?) [0.8, 0.9] Tog ji 
Ex 3 in [7]) | ¢2 = (u + x1)(0.1 + (a1 + 22)?) [0.4, 0.5] l 
B4 tı = —T1 + T2 — T3 [0.25, 0.27] 
g t2 = —T1 (x3 + 1) — 72 (0.08, 0.1] T=1 0.1 
Ex 4 in [7]) i 
t3 = -rı +u [0.25, 0.27] 
&y = z3 — 29 [0.38, 0.4] 
B5 . 
. T2 = T3 [0.45, 0.47] T=2 0.2 
(Ex 5 in [7]) . 
t3 =u [0.25, 0.27] 
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We use a large subset (7/10) of the examples from Verisig 2.0 [15], which 
are benchmarks used by most of the tools in the field, through e.g. the ARCH 
competition [17]. We also consider the same settings in terms of initial sets and 
the same time horizon. These are recalled in Table 1. 

We indicate some of RINO’s reachability results on these benchmarks in 
Table 2, before comparing the tightness and computing times with other tools. 


Table 2. RINO’s results for time step 0.05 (except Mountain Car, step 1.) 


Name over-approx under-approx t (s)|t docker 
Mountain Car [ — 0.78197, —0.64704] in 31. | 40.41 
sigmoid (2 x 200) | [ — 0.019387, —0.0093975] 
Discrete MC [ — 0.8711, —0.68326] [ — 0.82466, —0.7297] rae 
sigmoid (2 x 200) | [— 0.026888, —0.01411] | [ — 0.023716, —0.017282] 
(0.022471, 0.04829] [(0.029133, 0.041776] 
TORA [ — 0.80790, —0.78039] [ — 0.8037, —0.78452] Jell aa 
tanh (3 x 20) [ — 0.37201, —0.3433] 0 
(0.30682, 0.33235] i) 
[229.05, 230.29] [229.05, 230.29] 
[22.819, 22.868] [22.819, 22.868] 
ACC [ — 2.0285, — 2.0284] [ — 2.0285, — 2.0284] ey ae 
tanh [159.88, 161.02] [160.03, 160.87] 
[29.893, 30.006] i) 
[ — 0.30836, 0.01398] ) 
Bl [0.012957, 0.1349] i) 0.7) 0,92 
tanh (3 x 20) (0.18089, 0.23235] () 
B1 [0.10155, 0.15331] [0.12092, 0.13398] tel az 
sigmoid (3 x 20) [0.17188, 0.20041] 0 
B2 [ — 0.12356, —0.0811] 
L 0.2 | 0.21 
sigmoid (3 x 20) [0.16682, 0.26396] 
B3 [0.2256, 0.25296] [0.23507, 0.24352] ish ie 
tanh — 0.17777, —0.16092 i) 
Bi [ — 0.0017942, 0.010039] i) 
boot — 0.03494, —0.02305 [ — 0.032405, —0.02557] | 0.1 | 0,098 
[0.064524, 0.070953] ) 
ae — 0.42399, —0.38098 
(0.16388, 0.17547] l 2.7| 3.8 
tanh 
— 0.24869, —0.23363 


Settings. All tools, Verisig 2.0 and ReachNN* and RINO, were run without 
GPU support, under Ubuntu 18.04 docker, on a Mac running Mac OS Big Sur 
11.2.3 on a 2.3GHz Intel Core i9 processor with 16Gb of memory. Verisig 2.0 
and ReachNN* were run with the Reproductibility Package of Verisig 2.0 [15]. 
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For fairness of timing results, we also run RINO with docker, and the running 
ratios given in Table3 are those using these docker versions. RINO was also 
run natively on the same Mac. The performance degradation between the two 
versions of RINO can be estimated from the full data given in Table 2 from none 
to a 40% increase (with one exception at 80%), and most between 20 and 30%. 
This is higher than generally observed with docker, but due to the fact that 
docker on Macintosh is known to perform badly when it comes to IOs, using 
the underlying file system. Therefore, the performance degrades more when the 
system is of higher dimension and have more time steps to evaluate, since RINO 
logs all estimated ranges for all variables in separate files. 


Comparisons Results. We compare in Table3 the running times of Verisig 2.0, 
ReachNN* and RINO, and volumes of their final over-approximations, more 
precisely the widths of the projections of each component at final time horizon. 

The three tools depend on some parameters, in particular integration time 
steps and order of approximation. RINO does not require tuning the integration 
time steps and order of Taylor models so much, so we use one fixed time step 
of 0.05 for all examples. We use for Verisig 2.0 and ReachNN* the settings of 
the CAV Reproductibility package, that we suppose give good results. Verisig 
2.0 and ReachNN* actually perform poorly on the same examples with a fixed 
time steps of 0.05s. 

We experimented RINO with different time steps. The precision is relatively 
stable and does not necessarily improve when decreasing the time step. Indeed, 
as already noted [25], the improvement in approximation by Taylor models on 
smaller time steps is balanced by the loss of precision due to set-based abstraction 
being performed more often. Note also that the analysis time does not depend 
linearly on the time step: the control step, which rules the frequency at which the 
analysis of the neural net controller has to be performed, is fixed (see Table 1) 
and does not depend on the integration time step. 

Column 2 in Table3 describes the relative width of the intervals given by 
Verisig 2.0 for each variable at the final time and for each system, with respect 
to the one given by RINO. Column 4 is the same, but for ReachNN*. Columns 3 
and 5 give the ratio of the analysis time of Verisig 2.0 (respectively ReachNN*), 
with respect to the analysis time of RINO. 

In all cases, RINO is much faster than both Verisig 2.0 and ReachNN*, by 
factors ranging from 13 to 638.5. Moreover, this includes for RINO the time 
to compute the inner-approximations that Verisig 2.0 and ReachNN* do not 
compute. ReachNN* could not analyze TORA because of lack of memory on 
our platform, and timed out on ACC. Finally, interpolating the timings given 
in Table 1 of [12], e.g. for B1 (sig), Verisig 2.0 is reported to take 47s whereas 
POLAR is reported to take 20s on their platform. As Verisig 2.0 took 81.33s on 
our platform, we can infer that RINO is most certainly much faster, with e.g. 
3.62s for B1, than POLAR. 

RINO’s precision is of the same order as Verisig 2.0, and always better than 
ReachNN* by a factor of about 2 to 10. RINO is in fact even substantially more 
precise than Verisig 2.0 in some cases (B1 and B2 in particular). 
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Inner-Approzimations. Let us take example B1 (with sigmoid-based controller), 
and suppose we have a safety property that the value of x; should never be 
bigger than 1. Figure la represents in filled blue region the inner-approximation, 
as plain black lines the bounds of the outer-approximation, and as purple dots 
values actually reached, obtained by trajectories for sample initial conditions 
The over-approximation alone does raise a potential alarm with respect to the 
unsafe zone (in red), only the inner-approximation actually proves that the safety 


Table 3. Precision and running time comparisons RINO [timestep=0.05] vs Verisig 
2.0 [time steps of [15]] vs ReachNN* [time steps of [15]] 


% width Verisig2 Ratio time % width ReachNN* Ratio time 


Example over RINO _ Verisig2/RINO over RINO ReachNN*/RINO 

TORA (tanh) 117,68% 38,6 Mem full Mem full 
98,4% 
106,7% 
128,03% 

TORA (sig) 115,7% 43,4 Mem full Mem full 
68,0a% 
110,18% 
133,34% 

ACC (tanh) 101,94% 500,8 Time out Time out 
105,6ă% 
103,35% 
110,13% 
105,1% 
65,8ă% 

B1 (tanh) 84,9%% 88,8 96,7% 85,1 
287,8% 245,0% 

B1 (sig) 112,13% 105,4 227,8% 86,8 
140,63% 441,93% 

B2 (sig) 263,2% 77,6 408,8% 121,9 
60,43% 513,74% 

B3 (tanh) 99,53% 575 103,94% 81,9 
98,5a% 287,34% 

B3 (sig) 99,13% 55,2 176,84% 76,4 
98,04% 1043,94% 

B4 (tanh) 105,03% 187,9 224,2% 214,6 
101,63% 130,64% 
108,7ă% 896,0ă% 

B4 (sig) 105,43% 154,4 226,93% 173,5 
101,93% 132,34% 
107,75% 908,9% 

B5 (tanh) 100,24% 365,3 192,6% 8,9 
99,2%% 826,43% 
100,4ă% 635,6% 

B5 (sig) 100,23% 360,2 192,53% 9,0 
99,1% 851,63% 
100,4% 1437,43% 
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property is falsified. We also note on this picture that the over-approximation 
is very tight, given that samples give almost indistinguishable ranges. Figure 1b 
represents the inner and outer approximations of joint range (21,22) as well as 
estimation by sampling. As shown by the samples, (21,72) becomes almost a 
1D curve after some time, making inner approximation extremely difficult to 
estimate. Indeed our inner-approximation in orange is fairly precise for the first 
time steps, and the corresponding inner skewed boxes are rotated to match the 
curvy, 1D, shape of the samples. The green boxes printed on the picture are 
the box enclosure of the actually computed outer-approximation. Note that the 
inner-approximation of the projections on each component can be non-empty 
while having an empty joint inner range, as some approximation is committed 
in the joint inner range computation (as a skewed box) from the projected ranges. 


— maximal outer approx 
mm maximal inner approx 
estimated reachable states 


(a) zı as function of time (b) Joint range (1, x2) 


Fig. 1. B1: inner-approximation, outer-approximation and sampling (purple dots) 
(Color figure online) 


5 Conclusion and Future Work 


We presented the RINO tool, dedicated to the reachability analysis of dynam- 
ical systems, possibly controlled by neural networks. While providing accurate 
results, RINO is significantly faster than other state-of-the-art tools, which is 
key in view to address real-life reachability problems, where the systems and 
neural networks can be of high dimension. Moreover, as far as we are aware, it 
is the only existing tool to propose inner-approximations of the reachable sets 
of such systems. We currently handle only differentiable activation functions. 
We are thinking of some abstractions to handle ReLU activations as well, even 
though the approach is less natural in that case as it will introduce conservatism. 
We also plan to improve the accuracy of our current results by further special- 
izing this work to exploit the structure of neural network, such as monotonicity 
of activation functions. Finally, robustness is a crucial property for neural net- 
works enabled systems, and we plan to explore the possibilities offered by the 
computation of robust reachable sets. 
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Abstract. We present the STLMc model checker for signal temporal 
logic (STL) properties of hybrid systems. The STLMC tool can perform 
STL model checking up to a robustness threshold for a wide range of 
hybrid systems. Our tool utilizes the refutation-complete SMT-based 
bounded model checking algorithm by reducing the robust STL model 
checking problem into Boolean STL model checking. If STLMc does not 
find a counterexample, the system is guaranteed to be correct up to the 
given bounds and robustness threshold. We demonstrate the effectiveness 
of STLMc on a number of hybrid system benchmarks. 
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1 Introduction 


Signal temporal logic (STL) [31] has emerged as a popular property specification 
formalism for hybrid systems. STL formulas describe linear-time properties of 
continuous real-valued signals. Because hybrid systems exhibit both discrete and 
continuous behaviors, STL provides a convenient and expressive way to specify 
important requirements of hybrid systems. STL has a vast range of applications 
on hybrid systems, including automotive systems [26], robotics [24,40], medical 
systems [36], IoT [7], smart cities [30], etc. 

Due to the infinite-state nature of hybrid systems with continuous dynamics, 
most techniques and tools for analyzing STL properties focus on monitoring and 
falsification. These techniques analyze concrete samples of signals obtained by 
simulating hybrid automata to monitor the system’s behavior [13, 15,32] or find 
counterexamples [1,37,43], often combined with stochastic optimization. To this 
end, STL monitoring and falsification use quantitative semantics that defines the 
robustness degree to indicate how well the formula is satisfied. However, these 
methods cannot be used to guarantee correctness. 

Recently, several STL model checking techniques have been proposed for 
hybrid systems [3, 29,35]. In particular, the SMT-based bounded model checking 
algorithms [3,29] are refutation-complete, i.e., they can guarantee correctness up 
to given bounds. However, these techniques are based on the Boolean semantics 
of STL instead of quantitative semantics. This is a limitation for hybrid systems 
as small perturbations of signals can cause the system to violate the properties 
verified by Boolean STL model checking. Moreover, there exists no tool with a 
convenient user interface implementing STL model checking techniques. 
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This paper presents the STLMC tool for robust STL model checking of hybrid 
systems. Our tool can verify that, up to given bounds, the robustness degree of 
an STL formula ¢ is greater than a robustness threshold € > 0 for all possible 
behaviors of the system. We reduce the robust STL model checking problem 
to Boolean STL model checking using €-strengthening (perturbing the problem 
by e to make it harder to be true), first proposed in [21] for first-order logic 
and extended to STL. We then apply the refutation-complete bounded model 
checking algorithm [3,29] to build the SMT encoding of the resulting Boolean 
STL model checking problem, which can be solved using SMT solvers. 

Apart from the robust STL model checking method, STLMc also implements 
several techniques to improve the usability and scalability of the tool: 


— STLMc implements a generic interface to connect with various SMT solvers, 
such as Z3 [12], Yices2 [17], and dReal [22]. Since dReal can (approximately) 
deal with nonlinear ordinary differential equations (ODEs), STLMc can also 
support hybrid systems with nonlinear ODE dynamics. 

— STLMC implements parallelized two-step SMT solving to improve scalability. 
Instead of directly solving the complex encoding with ODEs, we first obtain 
a discrete abstraction without ODEs and find satisfying scenarios. We then 
check the discrete refinements of such scenarios using dReal in parallel. 

— STLmc provides a visualization command to draw counterexample signals 
and robustness degrees. Such graphs intuitively explain why the robustness 
degree of the formula is greater than a given threshold, and thus greatly help 
in analyzing counterexamples and debugging hybrid systems. 


We demonstrate the effectiveness of the STLMC tool on a number of hybrid 
system benchmarks— including linear, polynomial, and ODE dynamics— and 
nontrivial STL properties. The tool is available at https://stlmc.github.io. 


2 Background: Robust STL Model Checking 


Hybrid Automata. Hybrid systems are often formalized as hybrid automata [25], 
defined as a tuple H = (Q, X, init, inv, jump, flow). A set of modes Q specifies 
discrete states. A set of real-valued variables X = {z1, ...., £1} gives continuous 
states. A pair (q, U) of mode q € Q and vector @ € R! constitutes a state of H. An 
initial condition init(q, U) defines a set of initial states. An invariant condition 
inu(q, U) defines a set of valid states. A jump condition jump(q, U, q', v) defines a 
discrete transition from (q, 0) to (q’, v’). A flow condition flow(q, U, U+, t) defines 
a continuous evolution of X’s values from Vv to U; over time t in mode q. 

A signal o represents a continuous execution of a hybrid automaton H, given 
by a function [0,7) —> Q x R! with a time bound 7 > 0. A signal ø is called 
a trajectory of a hybrid automaton H, written øo € H, if ø describes a valid 
behavior of H: formally, there exists a sequence of times 0 = tp < ti <... < T 
such that: (i) o(to) is an initial state by init; (ii) for i > 1, H’s state evolves from 
o(t;) according to flow, while satisfying inv, for each time interval [t;—1, ti); and 
(iii) for i > 1, a discrete transition occurs by jump at each time point ti. 
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Signal Temporal Logic. Signal temporal logic (STL) is widely used to specify 
properties of hybrid systems [31]. The syntax of STL is defined by: 


p:=p]| =y | p^ | pUre 


where p denotes state propositions, and J C R>o is any interval of nonnegative 
real numbers. Examples of state propositions include relational expressions of 
the form f(7) > 0 over variables X with a real-valued function f : R? = R. 
Other common Boolean and temporal operators can be derived by equivalences: 
eg., pV =-A(7AvA-7y’), Org =TUry, Ory = 79179, ete. 

We consider a quantitative semantics of STL based on robustness degrees [15]. 
The semantics of a state proposition p is defined as a function p : Q x R! > R 
that assigns to a state the degree to which p is true, where R = R U {—oo, oo}. 
Specifically, the robustness degree of a state proposition f(Z) > 0 is the value of 
f(Z). E.g., the robustness degree of x > 4 is the value of x — 4 at a given state. 
The robustness degree of an STL formula can be defined as follows [15], where 
a time bound 7 of a signal is explicitly taken into account.! 


Definition 1. Given an STL formula p, a signal o : [0,7) > R!, and a time 
t € (0,7), the robustness degree p-(y,o,t) E R is defined inductively by:? 


p-(p, 0, t) = p(a(t)) 
pr(79,0,t) = —pr(¥, 9, t) 
pr(p1 A 2,9, t) = min(p,(1,0,t), pr(p2, 0,4) 
pr(p1 Ur p2,0,t) = SUPt/ €(¢+D)N 0,7) min(p,(p2,¢, t) inf fee) pr(p1,0,t")) 


The robust STL model checking problem is to determine if the robustness 
degree of an STL formula y is always greater than a given robustness threshold 
€ > 0 for all possible trajectories of a hybrid automaton H. 


Definition 2 (Robust STL Model Checking). For a time bound T > 0, an 
STL formula ọ is satisfied at time t € [0,7) on a hybrid automaton H with respect 
to a robustness threshold « > 0 iff for every trajectory o € H, p;(y,0,t) >. 


A Running Example. Consider two rooms interconnected by an open door. The 
temperature x; of each room, i = 0,1, changes depending on the heater’s mode 
qi € {On, Off} and the temperature of the other room. The continuous dynamics 
of x; can be specified as the following ODEs, where K;, hi, Ci, d; are determined 
by the size of the room, the heater’s power, and the size of the door [2, 19,25]: 


á —K;(cixi = dixzı—i) (Off), 


1 C.f., in the Boolean semantics of STL [29,31], the satisfaction of an STL formula is 
defined as a Boolean value (i.e., true or false). 

? The Minkowski sum of intervals J and J is denoted by I + J. For a singular interval, 
{t} +J is written as t+ I. We write supac 4 g(a) and infae g(a) to denote the least 
upper bound and the greatest lower bound of the set {g(a) | a € A}, respectively. 
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£o <17 Y 
Offo On: Offo Off: Ono Off: 
Lo = —ko(coxo — doz) £o = —ko(coxo — dox1) £o = ko(ho — (coxo — doz1)) 
1 = ky (hy — (ci 21 — dı z0)) £1 = —ky(e121 — dızo) 1 = —ky(e121 — dızo) 
10 < x zı < 30 zo > 10 zı > 10 zo < 30 gr = 10 
18 < 29,21 < 227 x < 16 


Fig. 1. A hybrid automaton for the networked thermostats. 


Figure 1 shows a hybrid automaton of our networked thermostat controllers. 
Initially, both heaters are off and the temperatures are between 18 and 22. The 
jumps between modes then define a control logic to keep the temperatures within 
a certain range using only one heater. We are interested in robust model checking 
of nontrivial STL properties, such as: 


$1: Yfo,15}(%o > 14 Ujo,.o) #1 < 19): at some moment in the first 15s, zı is less 
than or equal to 19; until then, xo is greater than or equal to 14. 

2: Oj, 4 (£o — z1 = 4 > Qj3,10] Go — xı < —3): between 2 and 4s, whenever 
£o — £1 > 4, £o — x1 < —3 holds within 10s after 3s. 


3 The STLMc Model Checker 


The STLMc tool can model check STL properties of hybrid automata, given 
three parameters € > 0 (robustness threshold), r > 0 (time bound), and N € N 
(discrete bound). STLMc provides an expressive input format to easily specify a 
wide range of hybrid automata. STLMC also provides a visualization command 
to give an intuitive description of counterexamples. 


3.1 Input Format 


The input format of STLMc, inspired by dReach [28], consists of five sections: 
variable declarations, mode definitions, initial conditions, state propositions, and 
STL properties. Mode and continuous variables define discrete and continuous 
states of hybrid automata. Mode definitions specify flow, jump, and invariant 
conditions. STL formulas can also include user-defined state propositions. 

Figure 2 shows the input model of the hybrid automaton described in the 
running example above. Constants are introduced with the const keyword. Two 
mode variables on0 and oni denote the heaters’ modes. Continuous variables x0 
and x1 are declared with domain intervals. There are three “mode blocks” that 
specify the three modes in Fig. 1 and their invariant, flow, and jump conditions. 

In mode blocks, a mode component includes a set of logic formulas over mode 
variables. An inv component contains a set of logic formulas over continuous 
variables. A flow component can include ODEs over continuous variables. A 
jump component contains a set of jump conditions of the form guard => reset, 
where guard and reset are logic formulas over mode and continuous variables, 
and “primed” variables denote states after the jump has occurred. 
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const kð = 0.015; const k1 = 0.045; xð >= 25 => (and (on@’ = @) (onl’ = onl) 

const hð = 100; const hl = 200; (x@’ = x0) (x1’ = x1)); 

const cQ = 0.98; const cl = 0.97; } 

const dô = 0.01; const d1 = 0.03; { mode: onð = ð; onl = Q; 

inv: x@> 10; x1 > 10; 

int ond; int oni; flow: d/dt[x0] = - kð * (c0 * xð - dð * x1); 

[1@, 35] xð; (10, 35] x1; d/dt{[x1] = - kT * (cl * x1 - d1 * x0); 

jump: 

{mode: onð = Q; onl = 1; xð <= 17 => (and (on@’ = 1) (onl’ = onl) 
inv: 10 < xð; x1 < 30; (x@’ = x0) (x1’ = x1)); 
flow: d/dt[x0] = - kð * (c0 * xð - dð * x1); x1 <= 16 => (and (on1’ = 1) (on®’ = ond) 

d/dt{[x1] = k1 + (hl - (cl * x1 - di * x@)); (x@’ = x0) (x1’ = x1)); 
jump: xð <= 17 => (and (on@’ = 1) (onl’ = @) 3 
(x@’ = x0) (x1’ = x1)); 
x1 >= 26 => (and (onl’ = @) (ond’ = one) init: on@ = 0; 18 <= xð; xð <= 22; 
(x@’ = x0) (x1’ = x1)); onl = 0; 18 <= x1; x1 <= 22; 

} 

{ mode: onð = 1; onl = Q; proposition: 
inv: x®@ < 30; x1 > 10; [p1]: x0 - x1 >= 4; [p2]: xð - x1 <= -3; 
flow: d/dt[x0] = k® * (hð - (c0 * x®@ - dð * x1)); 

d/dt[x1] = - k1 * (cl * x1 - di * xð); goal: 
jump: x1 <= 16 => (and (ond’ = @) (onl’ = 1) [f1]: <>[@,15](x@ >= 14 ULO, inf) x1 <= 19); 
(x@’ = x0) (x1’ = x1)); [f2]: CI[2, 4](p1 -> <>[3, 10] p2); 


Fig. 2. An input model example 


STL properties are declared in the goal section, and “named” propositions 
are declared in the proposition section. State propositions are arithmetic and 
relational expressions over mode and continuous variables. For example, in Fig. 2, 
the STL formula f1 contains two state propositions 7p > 14 and zı < 19, and 
the formula £2 contains the user-defined propositions p1 and p2. 


3.2 Command Line Options 


STLMC provides a command-line interface with various options in Table 1. The 
options -two-step and -parallel enable the two-step solving optimization in 
Sect. 4.3. STLMc supports three SMT solvers to choose from based on con- 
tinuous dynamics: Z3 [12] and Yices2 [17] can deal with linear and polynomial 
dynamics (solutions of ODEs are linear functions or polynomials), and dReal [22] 
can approximately deal with ODE dynamics with Lipschitz-continuous ODEs. 

A discrete bound N limits the number of mode changes and variable points 
at which the truth value of some STL subformula changes. This is a distinctive 
parameter of STL model checking that cannot typically be derived from a time 
bound 7 or the maximal number of jumps (say, m). E.g., for any positive natural 
number n € N, consider the function y(t) = sin(# - n- t); the state proposition 
y > 0 has n — 1 variable points even if there is no mode change (m = 0).° 

For the input model in Fig. 2, the following command found a counterexample 
of the formula £2 at bound 2 with respect to e = 2 in 15s using dReal: 


3 This example also hints that STL model checking can be arbitrary complex even for 
one mode; 7 and m cannot limit such model checking computation, whereas N can 
limit the computation involving both discrete and continuous behaviors. 
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Table 1. Some command line options for STLMc. 


Option Explanation Option Explanation 
-bound(N) a discrete bound -two-step enable two-step solving 
-time-bound(7r) a time bound -parallel parallel two-step solving 


-threshold(e) a robustness threshold -visualize generate visualization data 


-solver(Name) z3, yices, or dreal -goal goals to be checked 


-10 £ i i i 23 i -10 t i i i p2.—__ 4 
0 5 10 15 20 25 0 5 10 15 20 25 


Fig. 3. Visualization of a counterexample (horizontal dotted lines denote e = 2). 


$./stlmc ./therm.model -bound 5 -time-bound 25 -threshold 2 \ 
-goal f2 -solver dreal -two-step -parallel -visualize 
result: counterexample found at bound 2 (14.70277s) 


Similarly, the following command verified the formula £1 up to bounds N = 5 
and T = 25 with respect to e = 0.5 in 819s using dReal: 


$./stlmc ./therm.model -bound 5 -time-bound 25 -threshold 0.5 \ 
-goal f1 -solver dreal -two-step -parallel 
result : True (818.73110s) 


STLmc provides a command to visualize counterexamples for robust STL 
model checking. It can generate images representing counterexample trajectories 
and robustness degrees. Figure3 shows the visualization graphs, showing the 
values of variables or robustness degrees over time, generated for the formula 
£2 = O2 4] (£o — z1 > 4 > Qj3,10] (£0 — 1 < —3)) with the subformulas: 


£2; = To — T1 > 4 > Oj 10)(Lo — T1 < —3) £2) = =(£0 — 21 > 4) 
£23 = j3, 10] (£o — z1 < —3) pr=s9—21>4 po=ao—21<-3 


The robustness degree of f2 is less than € at time 0, since the robustness degree 
of £2; goes below e€ in the interval [2,4], which is because both the degrees of 
£2. and f2; are less than € in [2,4]. The robustness degree of £23 is less than e€ 
in [2,4], since the robustness degree of po is less than c€ in [5, 14] = [2, 4] + [3, 10]. 
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Fig. 4. The STLMc architecture 


4 Algorithms and Implementation 


Figure 4 shows the architecture of the STLMC tool. The tool first reduces robust 
STL model checking into Boolean STL model checking using e-strengthening. It 
then applies an existing SMT-based STL model checking algorithm [3,29]. The 
satisfiability of the SMT encoding can be checked directly using an SMT solver 
or using the two-step solving algorithm to improve the performance for ODE 
dynamics. Our tool is implemented in around 9,500 lines of Python code. 


4.1 Reduction to Boolean STL Model Checking 


As usual for model checking, robust STL model checking is equivalent to finding 
a counterexample. Specifically, an STL formula y is not satisfied on a hybrid 
automata H with respect to a robustness threshold € > 0 iff there exists a 
counterexample for which the robustness degree of ~g is greater than or equal 
to —e. (Formally, =(Vo € H. p;(y,0,t) > €) iff do € H. p,(-9,0,t) > —e.) 

Consider a state proposition x < 0. Its robust model checking is equivalent to 
finding a counterexample o € H with p,(a > 0,0,t) > —e, which is equivalent to 
p,(a > —e,0,t) > 0. Observe that x > —e is weaker than x > 0 by e. The notion 
of e-weakening is first introduced in [21] for first-order logic, and we extend the 
definitions of e-weakening and e-strengthening to STL as follows. 


Definition 3. The ¢-weakening y~* and e-strengthening yt* of y are defined 

as follows: (p~*)(s) = p(s) — € and (pt*)(s) = p(s) + € for a state s, and: 
(ap) =p") (ei A G2) SS er Neo" (ei Ur yo) = pr Urp“ 
CP) =A) (Pr A G2) = OY ng (p1 Urp)" = pi Ur 93° 
Finding a counterexample of y for robust STL model checking can be reduced 

to finding a counterexample of the e-strengthening y+‘ for Boolean STL model 


checking. The satisfaction of y by the Boolean STL semantics [29,31] is denoted 
by o,t H- y. We have the following theorem (see our report [42] for details). 


Il 


aa 


Theorem 1. (1) Jo € H. o,t H, 7(yt*) implies do € H. p;(79,0,t) > —e€, 
and (2) Yo € H. o,t KK, 7(y*) implies Yo € H. p,(y,o,t) >. 


As a consequence, a counterexample of yt for Boolean STL model checking 
is also a counterexample of y for robust STL model checking. If there is no 
counterexample of yt® for Boolean STL model checking, then y is satisfied on 
H with respect to any robustness threshold 0 < é’ < e. It is worth noting that 
y may not be satisfied on H with respect to e itself. 
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4.2 Boolean STL Model Checking Algorithm 


For Boolean STL model checking, there exist refutationally complete bounded 
model checking algorithms [3,29] with two bound parameters: 7 for the time 
domain, and N for the number of mode changes and variable points. A time 
point ¢ is a variable point if a truth value of y’s subformula changes at t. The 
algorithms build an SMT encoding DR r of Boolean STL model checking: 


Theorem 2. /3,29] DIe is satisfiable iff there is a counterexample trajectory 
o € H, with at most N variable points and mode changes, such that o,t 4, vy. 


For hybrid automata with polynomial continuous dynamics, the satisfiability 
of the encoding ¥ can be precisely determined using standard SMT solvers, 
including Z3 [12] and Yices2 [17]. For ODE dynamics, the satisfiability of W is 
undecidable in general, but there exist specialized solvers, such as dReal [22] and 
iSAT-ODE [18], that can approximately determine the satisfiability. 

To support various SMT solvers, the implementation of STLMC utilizes a 
generic wrapper interface based on the SMT-LIB standard [5]. Therefore, if it 
follows SMT-LIB, a new SMT solver can be easily integrated with our tool. 
Moreover, STLMC can also detect the most suitable solver for a given input 
model; e.g., if the model has ODE dynamics, then the tool chooses dReal. 

The encoding W includes universal quantification over time, e.g., because of 
invariant conditions. Several SMT solvers (including Z3 and Yice2) support these 
5v-conditions but at high computational costs [27]. For polynomial dynamics, we 
implement the encoding method [10] to simplify 3V-conditions to quantifier-free 
formulas. For ODE dynamics, dReal natively supports 4V-conditions [23]. 


4.3 Two-Step Solving Algorithm 


To reduce the complexity of ODE dynamics, we propose a two-step solving 
algorithm in Algorithm 1, inspired by the lazy SMT solving approach [38]: 


1. We obtain the discrete abstraction of the encoding W by substituting the 
flow and invariant conditions with Boolean variables. We then enumerate a 
satisfying scenario 7, a conjunction of literals, where m implies WV. 

2. For each scenario 7, we check the satisfiability of its discrete refinement with 
the flow and invariant conditions using dReal. If any refinement is satisfiable, 
we obtain a counterexample; otherwise, there is no counterexample. 


We also implement a simple method to avoid redundant scenarios by minimiz- 
ing a scenario. A scenario 7 = ly A+++ Alm is minimal if (=l; A Ajg; lj) > Y— one 
literal in 7 is false— is not valid. To minimize a scenario 7, we use a dual propa- 
gation approach [33]. Since m implies ¥, m A =W is unsatisfiable. We compute the 
unsatisfiable core of 7 A =W using Z3 to extract a minimal scenario from 7. 

We parallelize the two-step solving algorithm by running the satisfiability 
checking of refinements in parallel. If any of such refinements is satisfied and a 
counterexample is found, then all other jobs are terminated. If all refinements, 
checking in parallel, are unsatisfiable, then there is no counterexample. As shown 
in Sect. 5, it greatly improves the performance for the ODE cases in practice. 
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Algorithm 1: Two-Step SMT Solving Algorithm 
Input: Hybrid automaton H, STL formula y, threshold €, bounds 7 and N 
1 for k=1 to N do 
2 W ~— abstraction of the encoding ads without flow and inv; 
3 while checkSat(W) is Sat do 
4 mt <— a minimal satisfying scenario; 
5 aw — the refinement of m with flow and inv; 
6 if checkSat (7) is Sat then 
7 return counterexample (result.satAssignment) ; 
8 VA WA aT; 
9 return True; 


5 Experimental Evaluation 


We evaluate the effectiveness of the STLMc model checker using a number of 
hybrid system benchmarks and nontrivial STL properties.* We use the following 
models, adapted from existing benchmarks [2,6, 19, 20, 25,34]: load management 
for two batteries (Bat), two networked water tank systems (Wat), autonomous 
driving of two cars (Car), a railroad gate (Rail), two networked thermostats 
(Thm), a spacecraft rendezvous (Space), navigation of a vehicle (Nav), and a 
filtered oscillator (Oscil). We use a modified model with either linear, polynomial, 
or ODE dynamics to analyze the effect of different continuous dynamics. For each 
model, we use three STL formulas with nested temporal operators. More details 
on the benchmark models can be found in the longer report [42]. 

We measure the SMT encoding size and execution time for robust STL model 
checking, up to discrete bound N = 20 for linear models, N = 10 for polynomial 
models, and N = 5 for ODEs models, with a timeout of 60 min. We use different 
time bounds 7 and robustness thresholds e for different models, since 7 and e€ 
depend on each model. As an underlying SMT solver, we use Yices for linear 
and polynomial models, and dReal for ODE models with a precision 6 = 0.001. 
We run both direct SMT solving (1-step) and two-step SMT solving (2-step). 
We use 25 cores for parallelizing the two-phase solving algorithm. We have run 
all experiments on Intel Xeon 2.8 GHz with 256 GB memory. 

The experimental results are summarized in Table2, where |W| denotes the 
size of the SMT encoding W (in thousands) as the number of connectives in W. For 
the model checking results, T indicates that the tool found no counterexample 
up to bound N, and L indicates that the tool found a counterexample at bound 
k < N. For the algorithms (Alg.), we write one of the results with a better 


t For reachability properties, STLMc has a similar performance to other SMT-based 
tools, because STLMC uses the same SMT encoding. Indeed, our previous work [29] 
shows that the underlying algorithm used for STLMc has comparable performance 
to other tools for reachability properties. Nonetheless, our companion report [42] 
also includes some experimental results comparing STLMc with four reachability 
analysis tools (HyComp [9], SpaceEx [20], Flow* [8], and dReach [28]). 
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Table 2. Robust Bounded Model Checking of STL (Time in seconds) 


Dyn. Model T STL formula e |W| Time Result k Alg. en 
(O[3,5] P1) Ul2,10] P2 0.1 2.5 7.6 L 5 1-step - 

Car 40 3,10] ([5,15] P1) 0.5 10.8 559.2 T - 1i-step - 

S 2,5 p1) Rjo,10) P2 1.0 25 7.8 £ 5 1-step - 
i 1,3] (21 R1,10] P2) 2.5 18.8 25.1 T  - 1-step z 
z Wat 20  (0[1,10) P1) Uj2,5] P2 01 19 4.3 A 4 1-step - 
5 9[4,10] (P1 > [2,5] p2) 0.01 11.2 16.3 T - 1-step - 
E 014,10 (pı > [4,10] p2) 0.1 12.9 119.5 T - 1-step - 
4 Bat 30 (01,5) p1)R 5,20] P2 3.5 2.8 6.0 ah 5 1-step - 
4,14] (P1 > 00,10] P2) 0.1 38 446 l 8 1-step - 

( 2,10] pi)U 1,4] P2 0.5 2.0 4.4 L 4 1-step - 

Thm 10 © 0,5 (pı > [2,5) p2) 0.1 3.9 5.0 T - 1-step - 

= 910,10] (P1 R 2,4] p2) 1.0 5.7 6.3 T - 1-step - 
= 0,4] (P1 > 0 [2,5] p2) 0.5 2.2 5.5 aL 5 1-step - 
I Car 15 (0j,4) P1) Ujo,s} P2 2.0 1.7 4.7 L 8 4+step - 
Z [0,3] (P1 Ujo,5] p2) 0.1 7.3 7.7 T - 1-step - 
£ 00,5] (P1 U{i,3] p2) 1.0 2.3 3.0 aly 5 1-step - 
Rail 20  O[9,4](P1 > Ole, 10) P2) 5.0 3.8 38 T - 1-step - 

( 0,5) pı) Uj, 10] p2 40 1.9 2.7 L 4 1-step - 

910,15] (P1 Ujo, cc) P2) 0.5 1.2 818.7 T  - 2-step 3,580 

Thm 25 2,4}(P1 > [3,10] P2) 20 07 147 l 2 2step 91 

0,10] (p1 Ry 0,00) p2) 2.0 1.2 161.7 aL 4 2-step 279 

0,2} (P1 > [0,3] P2) 1.5 0.8 278.3 ale 2 2-step 79 

a Space 5 O72, 3)( [1,2] pı) 0.1 1.1 37.0 al 3 2-step 138 
ll 9 [0,4] (P1 Ulo, co} p2) 0.5 1.3 716.8 T - 2-step 2,681 
Z [0,3] (P1 Rio,cc) P2) 0.1 1.5 108.9 T - 2-step 326 
A Oscil 8 — O/2,5)(Co,3) 21) 10 1.2 1928 L 3 2step 601 
O (Oh1,3) P1) Ri2,5] P2 0I 1.8 112.1 £ 3 2-step 258 
O2,4](P1 > Ons) P2) 3.0 1.2 399.3 l 3 2-step 1,388 

Nav 10 oj 4) (Oļ3,6] 21) 2.0 1.1 332.2 L 3 2-step 1,213 

11,5] (P1 Ryo, co) P2) 1.0 1.4 749.6 T  - 2step 2,411 


performance. For the 2-step case, we also write the number of minimal scenarios 
generated (#7). Actually, two-step SMT solving timed out for all linear and 
polynomial models, and direct SMT solving timed out for all ODE models. 

As shown in Table 2, our tool can perform robust model checking of nontrivial 
STL formulas for hybrid systems with different continuous dynamics. The cases 
of ODE models generally take longer than the cases of linear and polynomial 
models, because of the high computational costs for ODE solving. Nevertheless, 
our parallelized two-step SMT solving method works well and all model checking 
analyses are finished before the timeout. In contrast, for linear and polynomial 
models with a larger discrete bound N > 10, direct SMT solving is usually effec- 
tive but the two-step SMT solving method is not. There are too many scenarios, 
and the scenario generation does not terminate within 60min. Therefore, the 
two algorithms implemented in our tool are complementary. 
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6 Related Work 


There exist many tools for falsifying STL properties of hybrid systems, including 
Breach [14], S-talrio [1], and TLTk [11]. STL falsification techniques are based 
on STL monitoring [13,32], and often use stochastic optimization techniques, 
such as Ant-Colony Optimization [1], Monte-Carlo tree search [43], deep rein- 
forcement learning [41], and so on. These techniques are often quite useful for 
finding counterexamples in practice, but, as mentioned, cannot be used to verify 
STL properties of hybrid systems. 

There exist many tools for analyzing reachability properties of hybrid systems 
based on reachable-set computation, including C2E2 [16], Flow* [8], Hylaa [4], 
and SpaceEx [20]. They can be used to guarantee the correctness of invariant 
properties of the form p — Uyq, but cannot verify general STL properties. 
In contrast, STLMC uses a refutation-complete bounded STL model checking 
algorithm to verify general STL properties, including complex ones. 

Our tool is also related to SMT-based tools for analyzing hybrid systems, 
including dReach [28], HyComp [9], and HybridSAL [39]. These techniques also 
focus on analyzing invariant properties of hybrid systems, but some SMT-based 
tools, such as HyComp, can verify LTL properties of hybrid systems. Unlike 
STLMc, they cannot deal with general STL properties of hybrid systems. 


7 Concluding Remarks 


We have presented the STLMC tool for robust bounded model checking of STL 
properties for hybrid systems. STLMC can verify that, up to given bounds, the 
robustness degree of an STL formula y is always greater than a given robustness 
threshold for all possible behaviors of a hybrid system. STLMc also provides a 
convenient user interface with an intuitive counterexample visualization. 

Our tool leverages the reduction from robust model checking to Boolean 
model checking, and utilizes the refutation-complete SMT-based Boolean STL 
model checking algorithm to guarantee correctness up to given bounds and find 
subtle counterexamples. STLMc can deal with hybrid systems with (nonlinear) 
ODEs using dReal. We have shown using various hybrid system benchmarks 
that STLMc can effectively analyze nontrivial STL properties. 

Future work includes extending our tool with other hybrid system analysis 
methods, such as reachable-set computation, besides SMT-based approaches. 
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Abstract. UCLID5 is a tool for the multi-modal formal modeling, ver- 
ification, and synthesis of systems. It enables one to tackle verification 
problems for heterogeneous systems such as combinations of hardware 
and software, or those that have multiple, varied specifications, or sys- 
tems that require hybrid modes of modeling. A novel aspect of UCLID5 
is an emphasis on the use of syntax-guided and inductive synthesis to 
automate steps in modeling and verification. This tool paper presents 
new developments in the UCLID5 tool including new language features, 
integration with new techniques for syntax-guided synthesis and satisfia- 
bility solving, support for hyperproperties and combinations of axiomatic 
and operational modeling, demonstrations on new problem classes, and 
a robust implementation. 


1 Overview 


Tools for formal modeling and verification are typically specialized for particu- 
lar domains and for particular methods. For instance, software verification tools 
like Boogie [4] focuses on modeling sequential software and Floyd-Hoare style 
reasoning, while hardware verifiers like ABC [5] are specialized for sequential 
circuits and SAT-based equivalence and model checking. Specialization makes 
sense when the problems fit well within a homogeneous problem domain with 
specific verification needs. However, there is an emerging class of problems, such 
as in security and cyber-physical systems (CPS), where the systems under verifi- 
cation are heterogeneous, or the types of specifications to be verified are varied, 
or there is not a single type of model that is effective for verification. An example 
of such a problem is the verification of trusted computing platforms [37] that 
involve hardware and software components working in tandem, and where the 
properties to be checked include invariants, refinement checks, and hyperprop- 
erties. There is a need for automated formal methods and tools to handle this 
class of problems. 

UCLID5 is a system for multi-modal formal modeling, verification, and syn- 
thesis that addresses the above need. UCLID5 is multi-modal in three impor- 
tant ways. First, it permits different modes of modeling, using axiomatic and 
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operational semantics, or as combinations of concurrent transition systems and 
procedural code. This enables modeling systems with multiple characteristics. 
Second, it offers a varied suite of specification modes, including first-order for- 
mulas in a combination of logical theories, temporal logic, inline assertions, pre- 
and post-conditions, system invariants, and hyperproperties. Third, it supports 
the first two capabilities with a varied suite of verification techniques, including 
Floyd-Hoare style proofs, k-induction and bounded model checking (BMC), veri- 
fying hyperproperties, or using syntax-guided and inductive synthesis to provide 
more automation in tedious steps of verification, or to automate the modeling 
process (as proposed in [34]). 

The UCLID5 framework was first proposed in 2018 [35], itself a major evo- 
lution of the much older UCLID system [6], one of the first satisfiability modulo 
theories (SMT) based modeling and verification tools. Since that publication [35], 
which laid out the vision for the tool and described a preliminary implementa- 
tion, the utility of the tool has been demonstrated on several problem classes 
(e.g., [7,8, 25]), such as for verifying security across the hardware-software inter- 
face. The syntax has been extended and state-of-the-art methods for syntax- 
guided synthesis (SyGuS) have also been integrated into the tool [28], including 
new capabilities for satisfiability and synthesis modulo oracles [32]. This tool 
paper presents an overview of the latest version of UCLID5, highlighting novel 
multi-modal aspects of the tool, as well as the new features supported since 
2018 [35]. The paper is structured as follows: in Sect. 2 we give an overview of 
the UCLID5 tool; in Sect. 3 we detail different multi-modal aspects of the tool, 
as well as high-lighting new features; and in Sect. 4 we present a case study using 
UCLID5 to verify a Trusted Abstract Platform. We cover related work in Sect. 5. 
The new features we highlight are: 


1. Fully integrated support for synthesis across all verification modes 

2. Support for modeling with external oracles, via satisfiability and synthesis 
modulo oracles [32] 

3. New language features to support combining axiomatic and operational mod- 
eling 

4. Direct support for hyperproperties 

5. Front-end translations from Chisel/FIRRTL to UCLID5, and from RISC-V 
binaries to UCLID5, referenced in Sect. 6. 

6. New case studies: covering models for distributed CPS in Lingua Franca [23], 
and encodings of uhb specifications and verification of a Trusted Abstract 
Platform described in Sects. 3.2 and 4 and in the corresponding artifact [31]. 


2 Overview of UCLID5 


In verification mode, UCLID5 reduces the question of whether a model satisfies 
a given specification to a set of constraints that can be solved by an off-the- 
shelf SMT solver. In synthesis mode, UCLID5 reduces the problem of finding 
an interpretation for an uninterpreted function such that the specification is 
satisfied into a SyGuS problem that can be solved by an off-the-shelf SyGuS 
solver. In order to do so, UCLID5 performs the following main tasks, as shown 
in Fig. 1: 
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Front End: UCLID5 takes models written in the UCLID5 language as input. 
The command-line front-end allows user configuration, including specifying the 
external SMT-solver /SyGuS-solver to be used, as well as enabling certain utilities 
such as automatically converting uninterpreted functions to arrays. The parser 
builds an abstract syntax tree from the model. 


AST Passes: UCLID5 performs a number of transformations and checks on the 
abstract syntax tree, including type-checking and inlining of procedures. This 
intermediate representation supports limited control flow such as if-statements 
and switch-cases, but loops are not permitted in procedural code and are removed 
via unrolling (bounded for-loops) or replacement with user-provided invariants 
(while loops). However, unbounded control flow can be handled by representation 
as transition systems (where each module consists of a transition system with 
an initial and a next block, each represented as a separate AST). 


Symbolic Simulator: The symbolic simulator performs a simulation of the tran- 
sition system in the model, according to the verification command provided, and 
produces a set of assertions. For instance, if bounded model checking is used, 
UCLID5 will symbolically execute the main module a bounded number of times. 
UCLID5 encodes the violation of each independent verification condition as a 
separate assertion tree. 


Synth-Lib Interface: UCLID5 supports both synthesis and verification. The 
Synth-Lib interface constructs either a verification or a synthesis problem from 
the assertions generated by the symbolic simulator. The verification problems 
are passed to the SMT-LIB interface, which converts each assertion in UCLID5’s 
intermediate representation to an assertion in SMT-LIB. Similarly, the synthesis 
problems are passed to the SyGuS-IF interface, which converts each assertion 
to an assertion in SyGuS-IF. The verification and synthesis problems are then 
passed to the appropriate provided external solver and the result is reported 
back to the user. 


assert 
Front-end |AST . Symbolic | tree Result + 
transformation : 
parser Simulator c-example 
passes 


SMT-LIB SyGuS-IF 
interface interface 
model ( query model ( 


SMT solver SyGuS solver 


query 


Fig. 1. Architecture of UCLID5 
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Basic UCLID5 Models. A simple UCLID5 model that computes the 
Fibonacci sequence is shown in Fig. 2. UCLID5 models are contained within 
modules which comprise of 3 parts: a system model represented using combina- 
tions of sequential, concurrent, operational and axiomatic modeling, as described 
in Sects. 3.2; a system specification described in Sect. 3.1; and a proof script that 
specifies the verification tasks UCLID5 should perform to prove that the system 
satisfies its specification, using a variety of supported verification and synthesis 
techniques described in Sect. 3.1. 


3 Multi-modal Language Features 


3.1 Multi-modal Verification and Synthesis 


Specification. UCLID5 supports a variety of different types of specifications. 
The standard properties supported include inline assertions and assumptions in 
sequential code, pre-conditions and post-conditions for procedures, and global 
axioms and invariants (both as propositional predicates, and temporal invariants 
in Linear Temporal Logic (LTL)). 

The latest version of UCLID5 further provides direct support for hyperinvari- 
ants and hyperaxioms (for k-safety). This new support for direct hyperproperties 
comprises of two new language constructs: hyperaxiom and hyperinvariant. The 
former places an assumption on the behavior of the module, if n instances of the 
module were instantiated, and the latter is an invariant over n instances of the 
module, which is verified via the usual verification methods. A variable x from 
the nt’ instance of the module is reasoned about in the predicate using z.n, and 
the number of modules instantiated is determined by the maximum n in both the 
invariant and the axiom. For example, hyperinvariant [2] det_xy: y.1==y.2 
asserts that a 2-safety hyperproperty holds. 


Verification. To verify these specifications, we implement multiple classic tech- 
niques. As a result, once a model is written in UCLID5, the user can deploy a 
combination of verification techniques, depending on the properties targeted. 
UCLID5 supports a range of verification techniques including: Bounded Model 
Checking (for LTL, hyperinvariants and assertion-based properties); induction 
and k-induction for assertion-based invariants and hyperinvariants; and verifica- 
tion of pre-and post-conditions on procedures and hyperinvariants. 

As an exemplar of the utility of multi-modal verification, consider the hyper- 
property based models verified by Sahai et al. [33]. These models use both pro- 
cedure verification and induction to verify k-trace properties. 


Synthesis. The latest version of UCLID5 integrates program synthesis fully 
across all the verification modes previously described. Specifically, users are able 
to declare and use synthesis functions anywhere in their models, and UCLID5 
will seek to automatically synthesize function bodies for these functions such 
that the user-selected verification task will pass. In this section, we give an illus- 
trative example of synthesis in UCLID5, we provide the necessary background 
on program synthesis, and then we formulate the existing verification techniques 
inside of UCLID5 for synthesis. 
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1) module main { 


// Part 1: System Description. 


1 var a, b : integer; 
init { 
€ a, b= 0, 1; 
8 next { 
9 a’, b’ =b, a+b; 
10 } 
11 
12 // Part 2: System Specification. 
13 invariant a_le_b: a <= b; 


15 // Part 3: (NEW) Synthesis Integration 
16 synthesis function 

17 h(x : integer, y : integer): boolean; 
18 invariant hole: h(a, b); 


20 // Part 4: Proof Script. 
21 control { 

22 induction; 

23 check; 

24 print_ results; 


Fig. 2. UCLID5 Fibonacci model. Part 3 shows the new synthesis syntax, and 
how to find an auxiliary invariant. 


Consider the UCLID5 model in Fig. 2. The user wants to prove by induction 
that the invariant a_le_b at line 13 always holds. Unfortunately, the proof fails 
because the invariant is not inductive. Without synthesis, the user would need to 
manually strengthen the invariant until it became inductive. However, the user 
can ask UCLID5 to automatically do this for them. Figure 2 demonstrates this 
on lines 16, 17 and 18. Specifically, the user specifies a function to synthesize 
called h at lines 16 and 17, and then uses h at line 18 to strengthen the existing 
set of invariants. Given this input, UCLID5, using e.g. cvc5 [3] as a syntax- 
guided synthesis engine, will automatically generate the function h(x, y) = x 
>= 0, which completes the inductive proof. 

In this example, the function to synthesize represents an inductive invariant. 
However, functions to synthesize are treated exactly like any interpreted function 
in UCLID5: the user could have called h anywhere in the code. Furthermore, this 
example uses induction and a global invariant, however, the user could also have 
used a linear temporal logic (LTL) specification and bounded model checking 
(BMC). In this sense, our integration is fully flexible and generic. Furthermore, 
the integration scheme allows us to enable synthesis for any verification proce- 
dure in UCLID5, by simply letting users declare and use functions to synthesize 
and relying on existing SyGuS-IF solvers to carry out the automated reasoning. 


3.2 Multi-modal Modeling 


Combining Concurrent and Sequential Modeling. A unique feature of 
the UCLID5 modeling language is the ability to easily combine sequential and 
concurrent modeling. This allows a user to easily express models representing 
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sequential programs, including standard control flow, procedure calls, sequential 
updates, etc., in a sequential model, and to combine these components within 
a system designed for concurrent modeling based on transition systems. The 
sequential program modeling is inspired by systems such as Boogie [4] and allows 
the user to port Boogie models to UCLID5. The concurrent modeling is done by 
defining transition systems with a set of initial states and a transition relation. 
Within UCLID5, each module is a transition system. A main module can be 
defined that triggers when each child module is stepped. For an example of 
this combination of sequential and concurrent modeling, we refer the reader 
to the CPU example presented in the original UCLID5 paper [35], which uses 
concurrent modules to instantiate multiple CPU modules, modeled as transition 
systems, with sequential code to model the code that executes instructions, and 
to the case study in Sect. 4. 


Reasoning with External Oracles. New in the latest version, UCLID5 sup- 
ports the modeling with oracle function symbols [32] in both verification and 
synthesis. Namely, a user can include “oracle functions” in any UCLID5 model, 
where an oracle function is a function without a provided implementation, but 
which is associated to a user-provided external binary that can be queried by 
the solver. We note that oracle functions (and functions in general) can only be 
first-order within the UCLID5 modeling language, i.e., functions cannot receive 
functions as arguments. 

This support is useful in cases where some components of the system are 
difficult or impossible to model, but could be compiled into a binary that the 
solver can query; or where the model of the system would be challenging for an 
SMT solver to reason about (for instance, highly non-linear arithmetic), and it 
may be better to outsource that reasoning to an external binary. 

UCLID5 supports oracle function symbols in verification by interfacing with a 
solver that supports Satisfiability Modulo Theories and Oracles (SMTO) [32], and 
in synthesis by interfacing with a solver that supports Synthesis Modulo Oracles 
(SyMO) [32]. 

Oracle function symbols are declared like functions, with the keyword oracle, 
and an annotation pointing to the binary implementation. For instance oracle 
function [isprime] Prime (x: integer): boolean would indicate to the 
solver that the binary isprime takes an integer as input and returns a boolean. 
This is translated into the corresponding syntax in SMTO or SyMO, as detailed 
in [30]. 

An exemplar of such reasoning in a synthesis file is available in the arti- 
fact [31], where we use UCLID5 to synthesize a safe and stabilizing controller 
for a Linear Time Invariant system, similar to Abate et al. [1]. 


Combining Operational and Axiomatic Modeling. UCLID5 can model 
a system being verified using an operational (transition system-based) app- 
roach, as Fig. 2 shows. However, UCLID5 also supports modeling a system in an 
axiomatic manner, whereby the system is specified as a set of properties over 
traces. Any execution satisfying the properties is allowed by the system, and 
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any execution violating the properties is disallowed. Axiomatic modeling can 
provide order-of-magnitude performance improvements over operational models 
in certain cases [2], and is often well suited to systems with large amounts of 
non-determinism. We provide an example of fully axiomatic modeling in the 
artifact [31]. 

However, uniquely, UCLID5 allows users to specify multi-modal systems 
using a combination of operational and axiomatic modeling. In such models, 
some constraints on the execution are enforced by the initial state and transi- 
tion relation (operational modeling), while others are enforced through axiomatic 
invariants (axiomatic modeling). This allows the user to choose the mode of mod- 
eling most appropriate to each constraint. For example, the ILA-MCM work [39] 
combined operational ILA (Instruction Level Abstraction) models to describe 
the functional behavior of processing elements with memory consistency model 
(MCM) orderings that are more naturally specified axiomatically [2]. (MCM 
orderings constrain shared-memory communication and synchronization between 
multiple processing elements.) The combined model, used for System-on-Chip 
verification, worked by sharing variables (called “facets” ) between both the mod- 
els. UCLID5 makes it much easier to perform such a combination. 

Figure 3 depicts parts of a UCLID5 model of microarchitectural execution 
that uses both operational and axiomatic modeling (similar to that from the 
ILA-MCM work), based on the spec specifications of COATCheck [24]. In this 
model, the steps of instruction execution are driven by the init and next blocks, 
i.e., the operational component of the model. Multiple instructions can step at 
any time (curTime denotes the current time in the execution), but they can only 
take one step per timestep. Meanwhile, axioms such as the fifoFetch axiom 
enforce ordering between the execution of multiple instructions. The fifoFetch 
axiom specifically enforces that instructions in program order on the same core 
must be fetched in program order. (Enforcing this order is tricky using opera- 
tional modeling alone). The transition rules and axioms operate over the same 
data structures, ensuring that executions of the final model abide by both sets 
of constraints. 

uspec models routinely function by grounding quantifiers over a finite 
set of instructions. Thus, to fully support pspec axiomatic modeling, we 
introduce two new language features —namely, groups and finite quanti- 
fiers. A group is a set of objects of a single type. A group can have any 
number of elements, but it must be finite, and the group is immutable 
once created. For instance, the group testInstrs in Fig.3 consists of four 
instructions. Finite quantifiers, meanwhile, are used to quantify over group 
elements. 

This example showcases UCLID5’s highly flexible multi-modal modeling 
capability. Models can be purely operational, purely axiomatic, or a combination 
of the two. Note that axiomatic modeling relies on the new language features 
finite forall and groups. For a further example of axiomatic and operational 
multi-modal modeling, we refer the reader to the case study checking reachability 
properties in reactive embedded systems described in the artifact [31]. 
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1| module main { 

2 <type declarations> 

3 var il, i2, i3, i4 : microop_t; 

1 <set il-i4 to be the instructions of a test, like mp 
5 group testInstrs : microop_t = {il, i2, i3, i4}; 


7 //Vars to decide which instrs to step and when. 


8 var nextl, next2, next3, next4 : boolean; 

9 var curTime : integer; 

10 

11 init { 

12 il.Fetch.nExists = false; il.Execute.nExists = false; 
13 Qa 

14 

15 //Axiom enforcing that instructions are fetched in order. 
16 axiom fifoFetch 

17 finite forall (i : microop_t) in testInstrs 

18 finite forall (j : microop_t) in testInstrs 


19 (ProgramOrder(i, j) && NodeExists(j.Fetch)) => 
20 EdgeExists(i.Fetch, j.Fetch) ; 


21 

22 procedure stepInst (index : integer) 

23 returns (instr_next : microop_t) 

24 

25 //Steps instr@index, unless it has completed. 
26 case 

27 (index == 1) : { 

28 instr_next = il; 

29 if (!instr_next.Fetch.nExists) { 

30 instr_next.Fetch.nExists = true; 
31 instr _next.Fetch.nTime = curTime; 
32 } else { 

33 <a> 

34 esac 

36 next { 

37 //Increment the current timestamp and 

38 //nondeterministically step instructions. 

39 curTime’ = curTime + 1; 


10 havoc nextl, next2, next3, next4; 


42 if (mextl) { call (il’) = stepInst(1); } 
13 if (mext2) { call (i2’) = stepInst(2); } 
44 if (mext3) { call (i3’) = stepInst(3); } 
45 if (mext4) { call (i4’) = stepInst(4); } 
46 } 

a7| } 


Fig. 3. UCLID5 model that incorporates both operational modeling (through 
the init and next blocks) and axiomatic modeling (through the axiom keyword). 


4 Case Study: TAP Model 


The final case study we wish to describe verifies a model of a trusted execution 
environment. Trusted execution environments [10, 11, 17,20] often provide a soft- 
ware interface for users to execute enclaves, using hardware primitives to enforce 
memory isolation. In contrast to software which requires reasoning about sequen- 
tial code, hardware modeling uses a paradigm that permits concurrent updates 
to a system. Moreover, verifying hyperproperties such as integrity requires rea- 
soning about multiple instances of a system which most existing tools are not 
well suited for. In this section, we present the UCLID5 port! of the Trusted 


1 https: //github.com/uclid-org/trusted-abstract-platform/. 
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Abstract Platform (TAP) which was originally? written in Boogie and intro- 
duced by Subramanyan et al. [37] to model an abstract idealized trusted enclave 
platform. We demonstrate how UCLID5’s multi-model support alleviates the 
difficulties in modeling the TAP model in existing tools. 


1] module tap { 

2 // State variable declarations 

var tap_enclave_metadata_valid: tap_enclave_metadata_valid_t; 

4 var tap_enclave_metadata_addr_map: tap_enclave_metadata_addr_map_t; 


// Enclave operations 
procedure launch(eid: tap_enclave_id_t, ...) { ... } 


init { ... } // initialize TAP 

10 next { // step the system 

11 case 

12 (tap_current_mode == mode_untrusted) : { 
13 call (...) = AdversarialStep(...) ; 


15 (tap_current_mode == mode_enclave) : { 
16 call (...) = EnclaveStep(...); 
17 } 


18 esac 
20| } 


22| module integrity proof { 
23 // Create two instances of the TAP model 
24 instance tap_1: tap(...); 
25 instance tap_2: tap(...); 
€ // Example invariant: Memory that is mapped are equal between the two traces 
7 invariant equal mem: (forall (pa : wap_addr_t) :: 
28 e_excl_map[pa] ==> (tap_1.mem[pa] == tap_2.mem[pa])); 
3) 
) 


init { ... } // initialize proof 
31 next { // step the system 
32 next(tap_1); next (tap_2); 


34 control { 

35 v = induction; 
36 check; 

37 } 

38| } 


Fig. 4. UCLID5 transition system-styled model of TAP and the integrity proof. 


Modeling the TAP and Proving Integrity. The UCLID5 model of TAP in 
Fig. 4 demonstrates some of UCLID5’s key features: the enclave operations of the 
TAP model (e.g. launch) are implemented as procedures, and a transition rela- 
tion of the TAP is defined using a next block that either executes an untrusted 
adversary operation or the trusted enclave, which in turn executes one of the 
enclave operations atomically. Proving the integrity hyperproperty on the TAP 
thus only requires two instantiations of the TAP model, specifying the integrity 
invariants, and defining a next block which steps each of the TAP instances 
as shown in the integrity_proof module. The integrity proof in UCLID5 uses 
inductive model checking. 


? https: //github.com/Otcb/TAP. 
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Results and Statistics of 
the TAP Modules. Table1 
shows the approximate size of 


Table 1. Boogie vs UCLID5 Model Results 


the TAP model in both Boo- Model/Proof Size Verif. 
gie and UCLID5. #pr, #fn, #pr #fn #an #ln Time (s) 
#an, and #ln refer to the Boogie 

number of procedures, func- TAP 22 25 254 1840 5l 
tions, annotations, and lines Integrity 14 11 71 835 346 
of code respectively. Annota- UCLID5 

tions are the number of loop TAP 53 25 87 2765 49 
invariants, assertions, assump- Integrity 2 0 54 293 30 


tions, pre- and post-conditions 
that were manually specified. The verification time includes compilation and 
solving. 

While the #ln for the TAP model in UCLID5 is higher than that of the model 
in Boogie due to stylistic differences, the crucial difference is in the integrity 
proof. The original model in Boogie implements the TAP model and integrity 
proof as procedures, where the transition of the TAP model is implemented 
as a while loop. However, this lack of support for modeling transition systems 
introduces duplicate state variables in a hyperproperty such as integrity, requires 
context switching and additional procedures for the new variables, which makes 
the model difficult to maintain and self composition unwieldy. In UCLID5, the 
proof is no longer implemented as a procedure, but rather, we create instances of 
the TAP model. We also note that the number of annotations is less in UCLID5 
compared to Boogie for the TAP model and proof. Additionally, this model 
lends itself for more direct verification of hyperproperties. 

The verification results are run on a machine with 2.6GHz 6-Core Intel Core 
i7 and 16GB of RAM running OSX. As shown on the right of Table1, the 
verification runtimes between the Boogie and UCLID5 models and proofs are 
comparable. 


5 Related Work 


There are a multitude of verification and synthesis tools related to UCLID5. 
In this brief review, we highlight prominent examples and contrast them with 
UCLID5 along the key language features described in Sect. 3. 

UCLID5 allows users to combine sequential and concurrent modeling (see 
Sect. 3.2). Most existing tools primarily support either sequential, e.g. [4,21, 38], 
or concurrent computation modeling, e.g. [5,9, 14, 26,27]. Although users of these 
systems can often overcome the tool’s modeling focus by manually including 
support for different computation paradigms, for example, Dafny can be used 
to model concurrent systems [22], this is not always straightforward, and lim- 
ited support for different paradigms can manifest as limitations in downstream 
applications. For example, the Serval [29] framework, based on Rosette, cannot 
reason about concurrent code. UCLID5, to the best of our knowledge, is the only 
verification tool natively supporting modeling with external oracles. 
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UCLID5 supports different kinds of specifications and verification procedures 
(see Sect. 3.1). Most existing tools [5,9,21] do not support multi-modal verifi- 
cation at all. Tools that do offer multi-modal verification do not offer the same 
range of options as UCLID5. For example, [26] does not support linear temporal 
logic, and [13,27] does not support hyperproperty verification. 

Finally, UCLID5 supports a generic integration with program synthesis (see 
Sect.3.1), and so related work includes a number of synthesis engines. The 
SKETCH system [36] synthesizes expressions to fill holes in programs, and has 
subsequently been applied to program repair [16,19]. UCLID5 is more flexi- 
ble than this work, and allows users to declare unknown functions even in the 
verification annotations, as well as supporting multiple verification algorithms 
and types of properties. Rosette [38] provides support for synthesis and verifi- 
cation, but, unlike UCLID5, the synthesis is limited to bounded specifications 
of sequential programs and external synthesis engines are not supported. Syn- 
thesis algorithms have been used to assist in verification tasks, such as safety 
and termination of loops [12], and generating invariants [15,40], but none of this 
work to-date integrates program synthesis fully into an existing verification tool. 
Before the new synthesis integration, UCLID5 supported synthesis of inductive 
invariants. The key insight of this work is to generalize the synthesis support, 
and to unify all synthesis tasks by re-using the verification back-end. 


6 Software Project 


The source code for UCLID5 is made publicly available under a BSD-license’. 
UCLID5 is maintained by the UCLID5 team‘, and we welcome patches from the 
community. Additional front-ends are available for UCLID5, including transla- 
tors from Firrtl [18]°, and RISC-V binaries® to UCLID5 models. An artifact 
incuding the code for the case studies in this paper is available [31]. 
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